get the data i want from the wiki of changting

Posted by tzul at 2020-04-08

Author: myh0st

Free book: Python black hat: the way of hacker and penetration test programming

Activity address: free book in March

We believe that the road of Xinan is a community. The activity of a community and the quality of technology sharing are all closely related to talents. As we all know, we have been making a contribution to book delivery in recent months. The articles in recent days are all forwarded, but this is not what I want to see. As a official account of the original class, I really don't want to end up. For a platform that can only reprint other people's articles, so I am looking for all kinds of talents on the Internet to join our sharing family.

So, how to find students who like sharing and are likely to join us?

Usually, friends who have their own blog can be regarded as those who like sharing, and their technical ability is secondary. As long as they love sharing, they are the people we are looking for.

So how to find friends with blogs?

There are two wikis in China The platform, one is, the other is, on which you have submitted a lot of articles with good quality, some of which are added to promote the platform itself, some of which are submitted actively by some friends when they see good articles, so this is a better resource base, with a large number of friends like to share on it, so this is My goal.

Today's theme is Changting's wiki, which is

Get links to all articles on the wiki platform

This work can't be done by hand. You have to use tools. Do you write your own tools?

As a lazy person, I haven't written code for a long time, so I will use the existing tools to complete it. The tools used here are: burp and emeditor.

burp emeditor

Open, as shown:

The number of pages in the URL can be traversed, so we use burp to set parameters and traverse it, as shown in the figure below:

After traversing, save the result returned by the response, and the saving options are as follows:

After saving, we can use our emeditor, which is my favorite with powerful functions. Extract all short links, as shown in the following figure:


As can be seen from the above figure, short links are regular, so you only need to export < a href = "/ URL" in the line, as shown in the figure below:

<a href="/url

The results after extraction are as follows:

Save the extracted results in a file, and then set the next wave of blasting, as shown in the figure:

After setting, start blasting, and then save the results, as shown in the figure:

Extract the row with location: and replace it to get all the links, as shown in the figure:


Analyze the acquired link data

After getting the results, we need to make a statistics of the websites involved in this, to see which websites appear the most and publish the most articles. Here you can use two commands under Linux: sort and uniq.

sort uniq

1. Extract the domain name in the result

Use emeditor to replace the HTTP part before the domain name with regular replacement, such as: http [S]?: / /, and then replace the part after the domain name with regular replacement, such as: /. *, and the result is as follows:

emeditor http http[s]?:// /.*

2. Sort and count the above results

You can use the following command to process the file:

sort links.txt | uniq -c > 1.txt

sort -r -k 1 -n 1.txt > 2.txt

The processing results are as follows:

Now you can see the blog of my friends. If the content is good, I will contact you.