Published on April 12, 2017 | classified in reptile Technology | heat degree ℃
The reason for this is that I wanted to find some classic movies to enjoy, so I asked an old driver for resources (I noted the need for regular video, which is definitely not the kind of resources he wanted), and then he gave me a video resource website, which is a famous video resource website. I believe that it is true, then excited to open the search for classic movies, so it led to a classic Baidu online battle. Disclaimer: the tools in this article are only for personal test and research. Please delete them within 24 hours after downloading, and do not use them for commercial or illegal purposes. Otherwise, you will be responsible for the consequences. The screenshots in this article are only for sample demonstration. Do not use them illegally. First, look at the screenshots of this video website: I have to say, this is a regular website, regular video, just looking at the title More. With a great desire for knowledge, I clicked on the link and saw the video resource link at the bottom of the page. There are two kinds of resources here, one is Baidu online disk, the other is Xunlei seed. It has to be said that this website is more conscientious, compared with some websites that only send pictures and don't leave seeds. According to the normal logic, at this time, I should click the resource address to enjoy it quietly (no, actually I am not that person), so I choose to add the resource to the online collection silently. Seeing that there are several more masterpieces on the Internet, I feel very happy. But just adding a few works didn't satisfy my collection desire. So I began to explore how to quickly add video resources to Baidu online, which triggered a series of struggles for Baidu online.
Prelude to war
First of all, by observing the URL composition of the website and the source code composition of the webpage, I decided to use crawling to collect the resource link address. Screenshot of webpage: there is no big problem in this process. I adopted the method of Python + cooperation to collect, and soon got a part of resource address: Baidu online disk resource address:
After writing the script of collecting data, it was 11:00 p.m. after collecting part of the data, I should have washed and slept. However, the power of technology exploration encouraged me to move on. At present, there are all resource addresses. However, for Baidu online disk resources, they still need to be opened one by one, and then added to my online disk. This step is too labor-intensive, so I decided to continue mining the method of automatically adding resources to Baidu online disk.
Note: the following content is the key technical content of this article, which is related to the final outcome of the World War I with Baidu online disk. Please don't go away and continue.
Ultimate battle
First of all, I analyzed the characteristics of Baidu's sharing page by grabbing the package, checking the source code, reviewing the elements, etc., and judged whether it was suitable for the crawler mode. After a series of tests, I found that although the process is a little bit tortuous, it can still add resources to the network disk automatically in the way of crawler.
To implement this technology, I summarize the following processes:
- Get the user cookie (you can manually log in and grab the packet)
- First, climb to the sharing page of the online disk, such as http://pan.baidu.com/s/1o8lkapc, to obtain the source code.
- Analyze the source code and filter out the name, shareid, from (UK), bdstoken and app ID (app_id) of the shared resources on the page.
- Construct the post package (to add resources to the network disk), which requires the above four parameters + cookies.
Get cookie
There are many tools for grabbing cookies. I use Firefox's tamper plug-in. The effect is as follows: get the data package of login: check the request package sent by login and find that there is an account password. Of course, what we need here is a cookie, which can be viewed in response.
The format of the cookie is as follows:
One
Two
Three
Four
Five
Six
Seven
BAIDUID=52C3FE49FD82573C4ABCEAC5E77800F6:FG=1;
BIDUPSID=52C11E49FD82573C4ABCEAC5E778F0F6;
PSTM=1421697115; PANWEB=1; Hm_lvt_7a3960b6f067eb0085b7196ff5e660b0=1491987412; Hm_lpvt_7a3960b6f067eb0085b7f96ff5e6260b0=1491988544;
STOKEN=3f84d8b8338c58f127c29e3eb305ad41f7c68cefafae166af20cfd26f18011e8;
SCRC=4abe70b0f9a8d0ca15a5b9d2dca40cd6;
PANPSC=16444630683646003772%3AWaz2A%2F7j1vWLfEj2viX%2BHun90oj%2BY%2FIsAxoXP3kWK6VuJ5936qezF2bVph1S8bONssvn6mlYdRuXIXUCPSJ19ROAD5r1J1nbhw55AZBrQZejhilfAWCWdkJfIbGeUDFmg5zwpdg9WqRKWDBCT3FjnL6jsjP%2FyZiBX26YfN4HZ4D76jyG3uDkPYshZ7OchQK1KQDQpg%2B6XCV%2BSJWX9%2F9F%2FIkt7vMgzc%2BT;
BDUSS=VJxajNlVHdXS2pVbHZwaGNIeWdFYnZvc3RMby1JdFo5YTdOblkydkdTWlVmUlZaSVFBQUFBJCQAAAAAAAAAAAEAAAA~cQc40NLUy7XEwbm359PwABAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFTw7VhU8O1Yb
Since this cookie involves a personal account, I made changes, but the format should be the same.
Visit Baidu resource sharing page
For example, after http://pan.baidu.com/s/1o8lkapc obtains the cookie, when visiting the baidu resource sharing page, I can write the cookie value in the headers and log in using the cookie. During this time, I also failed several times, because other header parameters need to be added (if the cookie parameter is not added, the returned result will be "the page does not exist"). After the request is successful, we can find some content we need in the source code, such as the name of page sharing resource, shareid, from (UK), bdstoken, APP ID (app_id) value.
Construct add resource post package
First, let's look at the structure of the post package:
One
Two
Three
Four
Five
Six
Seven
Eight
Nine
Ten
Eleven
Twelve
Thirteen
Fourteen
Fifteen
POST https://pan.baidu.com/share/transfer?shareid=2337815987&from=1612775008&bdstoken=6e05f8ea7dcb04fb73aa975a4eb8ae6c&channel=chunlei&clienttype=0&web=1&app_id=250528&logid= HTTP/1.1
Host: pan.baidu.com
Connection: keep-alive
Content-Length: 169
Accept: * / *
Origin: https://pan.baidu.com
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Referer: https://pan.baidu.com/s/1kUOxT0V?errno=0&errmsg=Auth%20Login%20Sucess&&bduss=&ssnerror=0
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.8,en;q=0.6
Cookie:
filelist=["/test.rar"]&path=/
There are some parameters in the URL of the post package. Just fill in the content we get. There is also a login parameter. The content can be written casually. It should be a random value and then base64 encryption is done. In the payload of the post package, filelist is the resource name in the format of filelist = ["/ name. MP4"], and path is saved in that directory. The format path = / path name cookie must be filled in, which is the cookie value we obtained before.
Final return content
One
{"errno":0,"task_id":0,"info":[{"path":"\/\u5a31\u4e50\u6e38\u620f\/\u4e09\u56fd\u5168\u6218\u6218\u68cb1.4\u516d\u53f7\u7248\u672c.rar","errno":0}],"extra":{"list":[{"from":"\/\u5a31\u4e50\u6e38\u620f\/\u4e09\u56fd\u5168\u6218\u6218\u68cb1.4\u516d\u53f7\u7248\u672c.rar", "to":"\/\u4e09\u56fd\u5168\u6218\u6218\u68cb1.4\u516d\u53f7\u7248\u672c.rar"}]}}
Finally, if you see the above content, it indicates that the resource has been successfully added to the network disk. If errno is another value, it indicates that an error has occurred. 12 indicates that the resource already exists.
Record
After nearly an hour, I finished writing the code, most of which was spent on debugging and researching the data package. During this period, I encountered many pits, but it was finally solved. Enjoy the pleasure when the program is running: the results of Baidu online market:
After that, it's almost midnight to write down this article. I only run a small part of the video resources, and the rest will continue tomorrow. Is it easy for me to watch some videos
Tomorrow I will release the source code. Today I will share my online disk bar: https://pan.baidu.com/s/1nvz74vn
Project GitHub address: https://github.com/tengzhangchao/baidupan
Welcome to sweep the WeChat official account above and subscribe to my blog!
Popular article recommendation: