IMCAFS

Home

dynamic change of front end against selenium automation tools

Posted by millikan at 2020-04-14
all

0x01 Preface

This is not a security technology article. If you pay attention to business security, wool party confrontation and reptile confrontation, you can watch it slowly.

In the field of business security, the biggest problem is the behavior of collecting wool from various automatic tools. There are many kinds of automatic weapons used by the wool party, among which the simulation is more like real people, and the operation based on selenium library to operate various real browser models is more used. Webdriver provided by selenium library supports mainstream browsers such as chrome, Firefox, ie, opera, phantomjs, Safari also supports the headless mode of browsers. For more information, see the article: https://www.cnblogs.com/zhao/p/6953241.html

0x02 demo of automation tool

The wool party will prepare automation tools before collecting wool. For example, they need automation tools to open the browser, open the login page, fill in the account password information, click Finish login, open the activity page, click rob tickets and other series of operations.

The following demo is a code simulated by Selenium + Chrome browser to automatically log in (or crash database, burst password) to the test website demo. testfire.net.

In the above code, first use webdriver in selenium to open the local Chrome browser, and then use the API interface provided by it to directly open the login address. To fill in the account and password information, you need to find the input location first. Selenium provides a variety of interfaces to find elements, which can be located by ID, name, XPath, CSS selector, or even text. When you find the location of the operation element, you can click, input, drag and so on.

0x03 get some knowledge background

Element positioning should be the core of automated testing. To operate an object, you should first identify the object. An object is the same as a person. It has various characteristics (attributes). For example, we can find a person by his ID number, name, or the street, floor, and doorplate where he lives.

Then an object has similar properties. We can find this object through this property. Webdriver provides a series of object location methods, including the following:

We take Baidu search page as an example, and use different positioning methods to call Python parameters as follows:

The input tag of Baidu search input box is as follows:

<input id="kw" name="wd" class="s_ipt" value="" maxlength="255" autocomplete="off">

The ways to use selenium's automation tools to locate elements are as follows:

1. Locate by ID

u = dr.find_element_by_id('kw')

2. Locate by name

u = dr.find_element_by_name('wd') s = dr.find_element_by_class_name('s_ipt')

4. Positioning through XPath

s = dr.find_element_by_xpath('//*[@id="kw"]')

5. Positioning through CSS selector

s = dr.browser.find_element_by_css_selector('#kw')

The above five methods are the most frequently used when using selenium to write automation tools to locate elements (others will not be listed). It can be seen that automation tools rely heavily on the ID, name and class equivalent of this input tag when searching elements.

Let's take a look at JavaScript's interface for finding elements on the browser

It also uses the value of ID, name and class to find elements

0x04 thinking about confrontation

This type of attack using driver to call browser to simulate human operation is difficult to protect through traditional JS anti crawling, UA blacklist and other ways, because this attack is the operation initiated by real browser. Recalling the above, this kind of attack implementation is an automatic operation driven by webdriver. Its principle is to inject its own JS script into every page of the browser, and complete the automatic click and input operation of the page through JS. The first idea is to detect the JS injected by webdriver, which is the current product idea of some manufacturers to prevent automatic attacks. Protection product a injects detection JS into every page, and realizes automatic tool recognition by detecting the function name and variable name brought by webdriver when injecting JS. This detection idea is also effective, but it is like killing soft anti-virus, and killing through feature library. Some feature function names and feature variable names have changed. For example, if the attacker rewrites the driver of webdriver and modifies the feature, there is no way to protect the product. This is also the problem that assassin soft has been facing...

Let's change our thinking. We know that we need to find the elements first to operate the page through JS. Through document. Find elenentby * *, the most common is to find elements through ID and name. If the JS injection process of webdriver is successful, if the tag ID, name, class name values are changed dynamically, then the find locating element process of the injected JS script fails, and it can also play the role of protecting automation tools.

New idea 1: dynamic changes of tag attributes interfere with the process of finding elements in JS

For the protection products, the first stage retains the original protection mode of detecting webdriver injection JS. Randomly switch to the second stage, i.e. let go of JS detection of webdriver, and dynamically confuse the value of ID, name and class name of key tags

When does it change dynamically? We can start from the first time that upstream returns content, and then we can hook some key events, which will change automatically when the page triggers such events.

New idea 2: random insertion of invisible identical labels

In the analysis of selenium, it is found that the invisible tags will interfere with the location of elements. As mentioned in Git hub of selenium, the higher version of selenium supports the search of disabled tags. Then we can construct tags with the same attributes as ID and name, but hidden and disabled, which can successfully interfere with its positioning element process, and the UI will not notice it. This idea works in the case that the automatic script uses CSS selector to locate elements instead of ID and name.

Here's a chestnut:

The automation tool can locate the above search location through the XPath syntax. The dynamic mode of ID and name can't be interfered.

Hidden in the search box above is the fake tag inserted,

Before inserting: the wool program avoids ID and name through the following structure

#main > header > div.header-content.clearfix > div.header-search-module > div.header-search-block > input

After inserting hidden Tags: the wool program needs to modify the XPath structure to find the correct input location.

#main > header > div.header-content.clearfix > div.header-search-module > div.header-search-block > input:nth-child(2)

The tag inserted here is hidden and disabled. Its UI is invisible and will not be submitted to the server by the form form. Perfect interference with the wool program search.

New idea 3: text insert invisible characters / other characters

The above idea almost interferes with the most commonly used element location method of webdriver, and the rest is through link text. Selenium supports searching for locating elements based on the text value or partial value of the label. We can interfere with the way webdriver locates elements by inserting special characters in the original text that do not affect the obvious changes of the page UI.

New thinking 4: turn passivity into initiative

The counter attack idea here is mainly to use the defects of the browser, the bugs of the automation tools, and even the loopholes to fight back. This is more interesting. We can check the issues of selenium, chromedriver and geckodriver on GitHub to collect the bugs found by the majority of netizens in the automatic test (pull wool). We deliberately construct such an environment to force the wool program to crash. In this way to interfere with blocking wool movement.

0x05 demo

After a long speech, you must have "show me the code...". Based on the above ideas, I'll start demo now. I use the MIM proxy to simulate the reverse proxy tool, and write Python script to modify the returned content of the original site and insert specific JS code. The inserted JS code implements the above ideas and hooks some window events, such as onkeypress, onclick, onchange events. When these events are triggered, some dynamic changes occur on the page. These dynamic changes have no obvious change on the UI, but affect the automatic behavior process of the JS script injected by selenium.

Demo1...: dynamic change ID, name value

The script of mitmproxy is written as follows: inject our JS script into the specific domain name and login page, and put it at the end of the body, so as to minimize the impact on the original page.

The injected JS script is too long to post. The main implementation is as follows:

Traversing the input tag through document.getElementsByTagName, records the ID, name, class name values of all tags, defines change function, generates random ID, name equivalent, hook some events, and invokes the change function to trigger a dynamic modification after triggering the event.

Finally, of course, you need the onsubmit function of hook form submission. First restore the real ID, name, class name values, and then go to normal submission.

Effect after JS injection:

When a page triggers an event, the ID and name of the key tag input will be changed once.

Normal human beings complete the normal work of business without perception.

Test with automation tools as follows:

First of all, JS is injected without modification of mitmproxy. The automatic login is successful as shown in the figure below.

Then access the proxy injected JS address, and the automatic login fails.

According to the prompt of console, we can see that the automation tool triggered some events in the work, the ID and name changed dynamically, and the program could not locate and complete the corresponding data input.

Demo2: insert the same invisible label

After JS is injected, tags with the same attributes such as hidden, disabled, ID and name are inserted:

Of course, normal business can not be affected, as follows:

The webdriver test is as follows: here is the program that successfully interfered with Firefox, saying that the found element is not available..

Demo3: text insert invisible / other characters

Selenium class tools support an operation called link location. Sometimes it is not an input box or a button, but a text link, which can be located through the link.

In this case, the automation program does not use the ID, name and XPath structure, but directly locates through the text search matching. In order to interfere, all we can do is dynamically modify the text in these key positions, so that its program can not be used for positioning.

For example, in the following test code, the program automatically opens the web page, finds the login link through "sign in", then opens it, finds the contact merchant through "contact us", and then finds the order page through "online form".. These column automation processes locate only through the text of the page. The previous ideas can't interfere with it. If we change the "sign in" of the page to "sign. In", we can make some changes on the UI as small as possible. It is also possible to successfully interfere with automated procedures.

The successful interference effect on webdriver is as follows:

There is only one way to prove demo. It's not easy to do it well, because I don't know how to find it yet. The UI on the browser doesn't display, but text is a different way to modify it.

Kneel for the advice of the front-end big man..

Demo4: turn passivity into initiative

The idea here is not well conceived, and the idea of using JS to attack the client is not yet good. Here are two bugs. How to use them to counterattack with JS? I'm looking forward to communicating with the big guys. Bug address: https://github.com/selenium hq/selenium/issues/5840, https://github.com/mozilla/geckodriver/issues/1228

0x06 summary

The idea of this article has been put forward for a long time. After writing this article, I know that as early as 16 years ago, Ctrip had a senior who wrote an article about anti crawler. It's enough to read this article. Address: https://blog.csdn.net/u013886628/article/details/51820221, which has been mentioned. The screenshot of the content mentioned is as follows:

It can be seen that Ctrip's anti reptile has a long history. I've made a fool of myself. It's a long way to go. I'll go up and down.