XML to JSON: Handling Nested Data Structures Like a Pro
Converting XML to JSON sounds straightforward until you encounter real-world XML documents. Attributes, mixed content, namespaces, CDATA sections, and the fundamental mismatch between XML's document model and JSON's data model create challenges that trip up even experienced developers.
This guide walks through the key structural differences, common edge cases, and practical solutions for converting between these two ubiquitous data formats.
XML vs JSON: Structural Differences
Before diving into conversion strategies, understanding how XML and JSON represent data differently is essential:
| Feature | XML | JSON |
|---|---|---|
| Data model | Document/tree | Data/object |
| Attributes | Supported on elements | No equivalent concept |
| Ordering | Element order is significant | Object keys are unordered |
| Data types | Everything is text | String, number, boolean, null, array, object |
| Namespaces | Full namespace support | No built-in concept |
| Comments | Supported | Not allowed |
| Mixed content | Text + elements together | Not naturally supported |
Mapping XML Elements to JSON Objects
The most basic conversion maps XML elements to JSON object keys and text content to string values:
Simple Element Mapping
<!-- XML -->
<user>
<name>Alice</name>
<age>30</age>
<email>[email protected]</email>
</user>
// JSON
{
"user": {
"name": "Alice",
"age": "30",
"email": "[email protected]"
}
}
Note that age remains a string in JSON because XML has no native type system. Smart converters will attempt type inference — converting "30" to the number 30 — but this can cause issues when values like ZIP codes ("07024") should stay as strings.
Handling XML Attributes
XML attributes have no direct JSON equivalent. The most common convention is to prefix attribute keys with @ or group them under an _attributes key:
<!-- XML with attributes -->
<product id="101" category="electronics">
<name>Laptop</name>
<price currency="USD">999.99</price>
</product>
Convention 1: @ Prefix (Most Common)
{
"product": {
"@id": "101",
"@category": "electronics",
"name": "Laptop",
"price": {
"@currency": "USD",
"#text": "999.99"
}
}
}
Convention 2: Grouped Attributes
{
"product": {
"_attributes": { "id": "101", "category": "electronics" },
"name": "Laptop",
"price": {
"_attributes": { "currency": "USD" },
"_text": "999.99"
}
}
}
The @ prefix convention (used by libraries like xmltodict in Python and xml2js in Node.js) is the most widely adopted because it keeps the JSON structure flat and readable.
The Array vs Single Element Problem
This is arguably the trickiest edge case in XML-to-JSON conversion. In XML, you can have one or many child elements with the same name:
<!-- One item -->
<order>
<item>Widget</item>
</order>
<!-- Multiple items -->
<order>
<item>Widget</item>
<item>Gadget</item>
<item>Doohickey</item>
</order>
A naive converter produces inconsistent output:
// Single item → string
{ "order": { "item": "Widget" } }
// Multiple items → array
{ "order": { "item": ["Widget", "Gadget", "Doohickey"] } }
This inconsistency breaks client code that expects either a string or an array. The solution is to use a "force array" option that always wraps repeatable elements in arrays:
// Consistent output with force array
{ "order": { "item": ["Widget"] } }
{ "order": { "item": ["Widget", "Gadget", "Doohickey"] } }
Most conversion libraries offer this as a configuration option. If you're using an XML Schema (XSD), the maxOccurs attribute tells you which elements can repeat.
Handling Namespaces
XML namespaces prevent element name collisions when combining documents from different sources:
<root xmlns:h="http://www.w3.org/TR/html4/"
xmlns:f="http://example.com/furniture">
<h:table>
<h:tr><h:td>Cell</h:td></h:tr>
</h:table>
<f:table>
<f:material>Oak</f:material>
</f:table>
</root>
Common strategies for representing namespaces in JSON:
- Preserve prefixes: Use keys like
"h:table"— simple but loses the URI mapping - Use full URIs: Use keys like
"{http://www.w3.org/TR/html4/}table"— precise but verbose - Strip namespaces: Remove prefixes entirely — works when there are no conflicts
- Namespace map: Add a separate
"_namespaces"object mapping prefixes to URIs
CDATA Sections
CDATA sections in XML contain text that should not be parsed as markup:
<script><![CDATA[
if (a < b && b > c) {
console.log("Valid");
}
]]></script>
In JSON, CDATA content becomes a regular string value. The key consideration is that special characters (<, >, &) should remain as literal characters in the JSON string, not their XML entity equivalents:
{
"script": "if (a < b && b > c) {\n console.log(\"Valid\");\n}"
}
Tools and Libraries for Conversion
Python
import xmltodict
import json
with open('data.xml') as f:
xml_dict = xmltodict.parse(f.read(), force_list=('item',))
json_str = json.dumps(xml_dict, indent=2)
JavaScript / Node.js
const { XMLParser } = require('fast-xml-parser');
const parser = new XMLParser({
ignoreAttributes: false,
attributeNamePrefix: '@_',
isArray: (name) => ['item', 'record'].includes(name)
});
const jsonObj = parser.parse(xmlString);
Java
import org.json.XML;
import org.json.JSONObject;
String xmlString = "<root><name>Test</name></root>";
JSONObject jsonObj = XML.toJSONObject(xmlString);
Edge Cases and Solutions
- Empty elements:
<tag/>can becomenull,"", or{}depending on the converter — choose one convention and stick to it - Mixed content: Elements with both text and child elements need special handling — typically using
#textas a key for the text content - Processing instructions:
<?xml version="1.0"?>is typically dropped during conversion - Comments: XML comments are discarded since JSON doesn't support them
- Whitespace: Significant whitespace in XML may be trimmed during conversion — this matters for preformatted content
- Encoding: JSON is always UTF-8, while XML can declare different encodings — ensure proper transcoding
Best Practices for XML-to-JSON Conversion
- Define your mapping conventions upfront — document how attributes, namespaces, and arrays will be handled
- Use schema information when available to determine which elements should be arrays
- Test with edge cases — single vs. multiple elements, empty elements, attributes on text-only elements
- Consider round-trip fidelity — can you convert back to XML without losing information?
- Validate the output — ensure the JSON conforms to any expected schema
Conclusion
XML-to-JSON conversion is deceptively complex, but understanding the structural mismatches and choosing consistent conventions makes it manageable. Whether you're dealing with attributes, namespaces, or the array ambiguity problem, having a clear mapping strategy is essential. For quick, hassle-free conversions, try ConvertMatrix's XML to JSON converter — it handles all these edge cases automatically. Need to go the other direction? The JSON to XML converter has you covered too.
Try Our Free Conversion Tools
Put what you've learned into practice with our browser-based converters: