Try it yourself with our free Xml Formatter tool — runs entirely in your browser, no signup needed.

How to Convert XML to JSON in Python

How to Convert XML to JSON in Python

=====================================================

Converting XML to JSON is a common task in data integration and processing. XML (Extensible Markup Language) is a widely used format for data exchange, but JSON (JavaScript Object Notation) is often preferred for its simplicity and ease of use. In this article, we will explore how to convert XML to JSON in Python, covering the most common use case, edge cases, and performance tips.

Quick Example


Here is a minimal example that converts an XML string to JSON using the xmltodict and json libraries:

import xmltodict
import json

xml_string = """
<person>
    <name>John Doe</name>
    <age>30</age>
</person>
"""

xml_dict = xmltodict.parse(xml_string)
json_string = json.dumps(xml_dict)

print(json_string)

This code converts the XML string to a Python dictionary using xmltodict, and then serializes the dictionary to a JSON string using json.dumps.

Step-by-Step Breakdown


Let's walk through the code:

  1. import xmltodict: We import the xmltodict library, which converts XML to Python dictionaries.
  2. import json: We import the json library, which serializes Python objects to JSON strings.
  3. xml_string = """...""": We define an XML string containing a simple person element.
  4. xml_dict = xmltodict.parse(xml_string): We pass the XML string to xmltodict.parse, which returns a Python dictionary representation of the XML data.
  5. json_string = json.dumps(xml_dict): We pass the dictionary to json.dumps, which serializes it to a JSON string.
  6. print(json_string): We print the resulting JSON string.

Handling Edge Cases


Empty/Null Input

If the input XML string is empty or null, xmltodict.parse will raise a ParseError. We can handle this by checking for empty input before parsing:

if not xml_string:
    raise ValueError("Input XML string is empty")
xml_dict = xmltodict.parse(xml_string)

Invalid Input

If the input XML string is invalid, xmltodict.parse will raise a ParseError. We can handle this by catching the exception and returning an error message:

try:
    xml_dict = xmltodict.parse(xml_string)
except xmltodict.ExpatError as e:
    raise ValueError(f"Invalid input XML: {e}")

Large Input

If the input XML string is very large, parsing it may consume excessive memory. We can handle this by using xmltodict.iterparse, which returns an iterator over the parsed XML elements:

for elem in xmltodict.iterparse(xml_string):
    # Process each element individually
    pass

Unicode/Special Characters

If the input XML string contains Unicode or special characters, xmltodict.parse will handle them correctly. However, if we need to serialize the resulting dictionary to a JSON string, we may need to specify the encoding:

json_string = json.dumps(xml_dict, ensure_ascii=False)

This ensures that the JSON string is encoded correctly for Unicode characters.

Common Mistakes


1. Forgetting to import dependencies

# Wrong
xml_dict = xmltodict.parse(xml_string)

# Correct
import xmltodict
xml_dict = xmltodict.parse(xml_string)

2. Not handling edge cases

# Wrong
xml_dict = xmltodict.parse(xml_string)

# Correct
try:
    xml_dict = xmltodict.parse(xml_string)
except xmltodict.ExpatError as e:
    raise ValueError(f"Invalid input XML: {e}")

3. Not specifying encoding for JSON serialization

# Wrong
json_string = json.dumps(xml_dict)

# Correct
json_string = json.dumps(xml_dict, ensure_ascii=False)

Performance Tips


1. Use xmltodict.iterparse for large input

Instead of parsing the entire XML string at once, use xmltodict.iterparse to iterate over the parsed elements individually.

2. Use json.dumps with indent parameter

When serializing the dictionary to a JSON string, use the indent parameter to pretty-print the output:

json_string = json.dumps(xml_dict, indent=4)

This can improve readability and debugging.

3. Use xmltodict.parse with dict_constructor parameter

When parsing the XML string, use the dict_constructor parameter to specify a custom dictionary constructor:

xml_dict = xmltodict.parse(xml_string, dict_constructor=dict)

This can improve performance by avoiding the creation of unnecessary intermediate dictionaries.

FAQ


Q: What is the difference between xmltodict and xml.etree.ElementTree?

A: xml.etree.ElementTree is a built-in Python library for parsing XML, while xmltodict is a third-party library that converts XML to Python dictionaries.

Q: How do I handle XML namespaces?

A: xmltodict supports XML namespaces out of the box. Simply pass the namespace mappings to the parse function:

xml_dict = xmltodict.parse(xml_string, namespaces={'ns': 'http://example.com'})

Q: Can I use xmltodict with other XML formats?

A: Yes, xmltodict supports various XML formats, including XML Schema, Relax NG, and more.

Q: How do I optimize the performance of xmltodict?

A: See the performance tips section above.

Q: What is the license for xmltodict?

A: xmltodict is licensed under the MIT License.

AI agent tools available. The CodeTidy MCP Server gives Claude, Cursor, and other AI agents access to 60+ developer tools. One command: npx @codetidy/mcp