# All About XML ## i. Basics of XML XML (eXtensible Markup Language) is a widely used markup language which is designed to store and transport data in a structured and human-readable format. XML does not use pre-defined tags like HTML (such as img, h1 etc). Instead, tags can be given any relevant names that describe the data it contains. For example, we can use \ to store book name, author & number of pages. XML follows a hierarchical structure, where data is organized into a tree-like format. The fundamental building block of an XML document is an element. An element consists of an opening tag, content, and a closing tag. In the below example, \ is the root element and it contains 3 child elements: \, \ & \. ![Simple XML Document](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*VJMQf2YNpljfm5-frDTBBw.png) ## ii. XML Entities XML entities are an essential component of XML standard which allow you to represent special characters, predefined entities, or custom entities within an XML document. For example, the entities \< and \> represent the characters < and > respectively. These are metacharacters used to denote XML tags, and so they should be represented using their entities when they appear within data, otherwise will conflict with the XML syntax. XML provides five predefined entities that represent commonly used characters: ![XML Entities](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*6pC5ROzCxODPd3EhCExjBA.png) These predefined entities are necessary because their respective characters have special meanings in XML syntax. By using these entities, we can include these characters in XML content without causing any errors. ## iii. Document Type Definition (DTD) XML documents can contain a document type definition (DTD), which defines the structure of an XML document and the data it contains. It specifies what elements can be used, in what order, and what attributes they can have. Let's say we have an XML document for a library. The DTD for this particular XML document would define the structure of the document, such as the elements like book, author, and title and publication date. It would also specify rules about the order in which elements should appear. ![Sample DTD](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*uvJJdZF1i66YVTgSMfW-mQ.png) These DTDs can be loaded from external sources or declared in the document itself (like previous example) within a DOCTYPE tag at the start of the XML document. ## iv. XML custom entities XML custom entities are used within the DTD to represent and reference reusable pieces of data or text. There are two types of custom entities in XML: Internal Entities and External Entities. **Internal Entities**: An internal entity is defined within the DTD or the document itself. It is declared using the declaration. The replacement value of an internal entity can be any valid text, including XML markup. ``` ]> Tuhin Bose &dept; Devang Solanki &dept; Sivadath KS &dept; ``` In the above example, the entity “dept” is defined with the replacement value “Security” (Line 7). So whenever the “dept” entity reference will be used (Line 12, 16, 20) within the XML document, it will be replaced with the value “Security”. **External Entities**: An external entity is defined in a separate file and referenced within the XML document. The declaration of an external entity uses the SYSTEM keyword and must specify a URL from which the value of the entity should be loaded. The replacement value of an external entity is the content of the referenced file. ![Accessing Remote files](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*epO7eBMoADdfUK3n6WMZkg.png) So whenever the "website" entity reference will be used, it will be replaced with the content of "https://bugbase.in/hey.txt". The file:// protocol can also be used to load external entities from local files. For example, ![Accessing Local files](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*BWej7vJQwCk7HE4JZjiIoA.png) # XML External Entity Injection (XXE) ## What is XXE? XML External Entity Injection is a popular vulnerability that arises when an application process user supplied XML data on the server using a poorly configured XML parser. An attacker can exploit XXE vulnerability to read arbitrary files from the server, achieve SSRF, perform Denial of Service attack & even exectute arbitrary commands on the system. ## Why XXE arises? Instead of JSON or form-data, some applications use XML to transmit data between the application and the server. They generally use a standard library or platform API to process the XML data on the server. XXE arises due to the improper handling of external entities by these XML parsers. ## Exploitation There are various types of XXE attacks. In this article, we'll try to explain the major ones: i. **Access Files from the Server:** We can create an external entity containing the contents of a file, and use the entity reference to view the response. Let's assume that there is a hospital management portal which checks for admitted patient's details by sending the following HTTP request to the server: ``` POST /checkDetails HTTP/1.1 Host: hospitalportal.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Accept-Encoding: gzip, deflate Accept-Language: en-US Pragma: no-cache Content-Type: application/xml Content-Length: 110 Upgrade-Insecure-Requests: 1 3 19 ``` If the patientId doesn't match with the database, it'll show something like "The patientId 19 doesn't exist in our database." otherwise it'll return the details of the patient. Point to be noted that the patientId (in this case, 19) will be reflected in the response if it doesn't exist in the database. So, we can create an external entity which'll contains the contents of `/etc/passwd` & give reference to the entity inside patientId. Since the specific patientId won't exist in the database, it'll return something like `The patientId doesn't exist in our database`. So the payload will be ![Access Sensitive Files from the Server](https://miro.medium.com/v2/resize:fit:828/format:webp/1*GGNdHyFGU3-t1PQbyBf7Iw.png) ii. **Server Side Request Forgery (SSRF) through XXE**: In addition to retrieving system files, XXE can be utilized to launch SSRF attack. For example, to invoke a HTTP request, we can specify the following xml body ![Invoking HTTP Request](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*UOi0hsuZHXv1Kqy4yzKN1Q.png) If the application is hosted on an aws ec2 instance, we can try accessing the [AWS metadata endpoint](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/instancedata-data-retrieval.html). ![Accessing AWS Metadata Endpoint](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*VZzQU9wDcZIobu4dPV2q8A.png) iii. **XXE via image file upload:** Some applications allow users to upload images which are being processed/validated on the server side using image processing libraries. Apart from usual image file formats (such as JPEG, JPG or PNG), these image processing libraries might support SVG images also. Since the SVG format uses XML, an attacker can try to upload a malicious SVG image which will result in XXE vulnerability. Let’s understand the following SVG payload: ![Malicious SVG Image](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*CeCpF0cs9e_c9iTGeybO7Q.png) First four lines are already explained in the previous sections. In the next line, we've defined the width and height of the SVG image (in pixels). Next, we've defined the font-size of the characters of `/etc/passwd` file (i.e. content of `&myfile;`) using `font-size` attribute within `` tag. The other attribute `x` and `y` defines the axis on which the text is going to render. iv. **Dos (Billion Laugh Attack):** We can perform denial of service attack via XXE. This attack is also known as Billion Laugh attack. It occurs when the xml parser continually expands each entity within itself, which overloads the server and results in denial of service. Let’s understand the following xml document ``` ]> &lol9; ``` First, we have defined an internal entity lol as "lol" (line 3). In the 5th line, we've again defined an entity lol1 which'll call the entity lol 10 times which means it'll be "lollollollollollollollollollol". Again we've defined an entity lol2 which will call the entity lol1 10 times that means it’ll be "lollollol… 100 times". By this way, the entity lol9 would be "lollollollol…..10⁹ times". Finally in the last line, lol9 will be called, it'll try to print "lollollollol…..10⁹ times". This amount of processing and the size of the string causes a denial of service as the XML parser quickly exhausts the system's resources. v. **Remote Code Execution (Rare)**: In some rare cases, PHP expect module may be loaded which allows us to execute arbitrary commands using the following payload: ![Executing "id" Command](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*TEl3fV67mP9canRygC754A.png) vi. **Modifying Content-Type**: Some applications might support content types other than the one which is generated by its frontend HTML form. So we can try to change the content-type from the default one (URL-encoded or JSON) to XML. For example, the request body `username=tuhin1729` or `{"username":"tuhin1729"}` can be changed to `tuhin1729`. vii. **Blind XXE:** Now let's discuss about Blind XML External Entity Injection. In the previous example of hospital management portal, we've assumed that the application returns the value of an element within its responses (if the patientId doesn't match with the database, it'll show something like "The patientId 19 doesn’t exist in our database."). Blind XXE occurs when the application is vulnerable to XXE but does not return the values of any element within its responses. It means that we can not directly see the output. To detect a blind XXE, we can trigger an out-of-band network interaction to our burp collaborator using the SSRF through XXE technique. Now before understanding exploitation of blind xxe, we need to understand some basic concepts about xml parameter entities. In some cases, regular xml external entities are blocked because of some input validation by the application. In these cases, we may use XML parameter entities instead. XML parameter entities are a special kind of XML entity which can only be referenced elsewhere within the DTD. To decrale a XML parameter entity, we need to use the following syntax: ``` ``` Parameter entities are referenced using the percent character (`%bugbase`) instead of the usual ampersand which we've used in XML external entity. Now in case, regular xml external entities are blocked, we can detect blind XXE by the following payload: ![Triggering out-of-band network interaction](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*pP2b4aVBRim-UFAUZbe_cw.png) This payload defines an XML parameter entity `myweb` and then call it within the DTD using `%myweb`. If the application is vulnerable, this will cause DNS lookup and HTTP request to bugbase.in (i.e. attacker's domain). Generally, there are two ways to exploit blind XXE: (I) **Exfiltrate data using out-of-band network interactions**: Once we've detected blind xxe vulnerability, next step is to exfiltrate sensitive data. For this, we need to host a malicious DTD in our server & invoke the external DTD within the xml payload. To exfiltrate the contents of the `/etc/passwd` file, save the following DTD in your server as `bugbase.dtd` ![bugbase.dtd](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*-t3G-evyGH8Wfh5B8FbNHg.png) Here is the explanation of the above payload: Line 1: We've defined an XML parameter entity `myfile` which contains the content of the file `/etc/passwd` . Line 2: We have defined an XML parameter entity `eval` which contains a dynamic declaration of another XML parameter entity `exfil`. Once the `exfil` entity will be called, it'll send a HTTP request to the attacker controlled domain `attacker.bugbase.in` & send the content of `/etc/passwd` (using `myfile` entity) through the GET parameter `data`. Line 3: Here we've called the `eval` entity so that the declaration of the `exfil` entity is performed. Line 4: Finally, we've called the `exfil` entity to invoke the HTTP request. Now we need to invoke this external DTD within our xml payload. ![Invoking the bugbase.dtd](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*V59pn5_JShe1e6P9Hpr8Rg.png) The above payload defines an XML parameter entity `tuhin` and calls the entity within the DTD. This will cause the XML parser to fetch the external DTD from attacker’s server and process it. Then the malicious DTD will be executed, and the content of `/etc/passwd` will be sent to the attacker's server. (II) **Trigger XML parsing errors to reveal sensitive data**: This technique will only work if the application returns the error message in the response. Here the main idea is to trigger an XML parsing error where the error message contains the sensitive data. We can do so by trying to include a non-existent file using XML parameter entity. Let’s take a look at the following payload: ![Including non-existent file to cause errors](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*6RfY-87QUOF2Wd5GNuqNEw.png) In the first line, we've defined an XML parameter entity `myfile` which contains the content of the file `/etc/passwd`. Next, we've defined another XML parameter entity `eval` which contains declaration of the XML parameter entity `myerror`. The `myerror` entity will try to get a non-existent file whose name contains the value of the `myfile` entity (i.e. the content of `/etc/passwd`). Then, we've called the `eval` entity so that the declaration of the `myerror` entity is performed. Finally, we've called the `myerror` entity to invoke the error since the file doesn't exist. Now we need to invoke this external DTD within our xml payload. ![Invoking the bugbase.dtd](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*Ow-YjGwBJ7_0ZHkG1hPAgA.png) If the application is written in java, it'll reflect an error like: ``` java.io.FileNotFoundException: /nonexistent/root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin ... ``` # Prevention Generally, XXE vulnerability arises due to the improper handling of external entities by these XML parsers. The easiest and most effective way to prevent XXE attacks is by disabling external entities and support for `XInclude`. Additionally, a web application firewall (WAF) can be implemented that can block XXE inputs. For more detailed prevention strategies, take a look at [OWASP XXE Prevention Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html). # References [1] PortSwigger: [https://portswigger.net/web-security/xxe](https://portswigger.net/web-security/xxe) [2] HackTricks: [https://book.hacktricks.xyz/pentesting-web/xxe-xee-xml-external-entity](https://book.hacktricks.xyz/pentesting-web/xxe-xee-xml-external-entity) [3] Payloads All The Things: [View Github Repository](https://github.com/swisskyrepo/PayloadsAllTheThings/tree/master/XXE%20Injection)