Try it yourself with our free Json Formatter tool — runs entirely in your browser, no signup needed.

How to Flatten nested JSON in Python

How to Flatten Nested JSON in Python

Flattening nested JSON data is a common task in data processing and analysis. JSON (JavaScript Object Notation) is a lightweight data interchange format that is widely used for exchanging data between web servers, web applications, and mobile apps. When working with JSON data, you often encounter nested structures, which can be cumbersome to work with. In this article, we will explore how to flatten nested JSON data in Python, a popular language for data science and scripting.

Quick Example

Here is a minimal example that flattens a nested JSON object:

import json

def flatten_json(data):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '.')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '.')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(data)
    return out

# Example usage:
json_data = '''
{
    "name": "John",
    "age": 30,
    "address": {
        "street": "123 Main St",
        "city": "Anytown",
        "state": "CA",
        "zip": "12345"
    },
    "interests": ["reading", "hiking", "coding"]
}
'''

data = json.loads(json_data)
flattened_data = flatten_json(data)
print(flattened_data)

This code defines a recursive function flatten_json that takes a JSON object as input and returns a flattened dictionary.

Step-by-Step Breakdown

Let's walk through the code line by line:

  1. import json: We import the built-in json module, which provides functions for working with JSON data.
  2. def flatten_json(data):: We define a function flatten_json that takes a JSON object as input.
  3. out = {}: We initialize an empty dictionary out to store the flattened data.
  4. def flatten(x, name=''):: We define a nested function flatten that takes two arguments: x (the current value being processed) and name (the current key path).
  5. if type(x) is dict:: We check if the current value x is a dictionary. If so, we iterate over its keys and recursively call flatten on each value.
  6. elif type(x) is list:: We check if the current value x is a list. If so, we iterate over its elements and recursively call flatten on each element.
  7. else:: If the current value x is not a dictionary or list, we assign it to the out dictionary using the current key path name.
  8. flatten(data): We call the flatten function on the input JSON object.
  9. return out: We return the flattened dictionary.

Handling Edge Cases

Here are some common edge cases to consider:

Empty/Null Input

If the input JSON object is empty or null, the function should return an empty dictionary:

json_data = '{}'
data = json.loads(json_data)
flattened_data = flatten_json(data)
print(flattened_data)  # Output: {}

Invalid Input

If the input is not a valid JSON object, the json.loads function will raise a JSONDecodeError. We can handle this by wrapping the json.loads call in a try-except block:

try:
    data = json.loads(json_data)
    flattened_data = flatten_json(data)
    print(flattened_data)
except json.JSONDecodeError as e:
    print(f"Invalid JSON: {e}")

Large Input

For very large JSON objects, the recursive function may exceed Python's maximum recursion depth. We can avoid this by using an iterative approach instead of recursion:

def flatten_json(data):
    out = {}
    stack = [(data, '')]

    while stack:
        x, name = stack.pop()
        if type(x) is dict:
            for a in x:
                stack.append((x[a], name + a + '.'))
        elif type(x) is list:
            i = 0
            for a in x:
                stack.append((a, name + str(i) + '.'))
                i += 1
        else:
            out[name[:-1]] = x

    return out

Unicode/Special Characters

The function should handle Unicode characters and special characters correctly. We can test this by using a JSON object with non-ASCII characters:

json_data = '''
{
    "name": "\u2605",
    "age": 30,
    "address": {
        "street": "123 Main St",
        "city": "\u00c9vian",
        "state": "CA",
        "zip": "12345"
    }
}
'''

data = json.loads(json_data)
flattened_data = flatten_json(data)
print(flattened_data)

Common Mistakes

Here are some common mistakes developers make when flattening JSON data:

Mistake 1: Not Handling Nested Lists

If the JSON object contains nested lists, the function may not handle them correctly. For example:

json_data = '''
{
    "name": "John",
    "age": 30,
    "address": {
        "street": "123 Main St",
        "city": "Anytown",
        "state": "CA",
        "zip": "12345"
    },
    "interests": [
        ["reading", "hiking"],
        ["coding", " gaming"]
    ]
}
'''

# Incorrect code:
def flatten_json(data):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '.')
        else:
            out[name[:-1]] = x

    flatten(data)
    return out

# Corrected code:
def flatten_json(data):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '.')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '.')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(data)
    return out

Mistake 2: Not Handling Unicode Characters

If the JSON object contains Unicode characters, the function may not handle them correctly. For example:

json_data = '''
{
    "name": "\u2605",
    "age": 30,
    "address": {
        "street": "123 Main St",
        "city": "\u00c9vian",
        "state": "CA",
        "zip": "12345"
    }
}
'''

# Incorrect code:
def flatten_json(data):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '.')
        else:
            out[name[:-1]] = x.encode('ascii', 'ignore')

    flatten(data)
    return out

# Corrected code:
def flatten_json(data):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '.')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '.')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(data)
    return out

Mistake 3: Not Handling Large Input

If the JSON object is very large, the recursive function may exceed Python's maximum recursion depth. For example:

json_data = '''
{
    ...
}
'''

# Incorrect code:
def flatten_json(data):
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '.')
        else:
            out[name[:-1]] = x

    flatten(data)
    return out

# Corrected code:
def flatten_json(data):
    out = {}
    stack = [(data, '')]

    while stack:
        x, name = stack.pop()
        if type(x) is dict:
            for a in x:
                stack.append((x[a], name + a + '.'))
        elif type(x) is list:
            i = 0
            for a in x:
                stack.append((a, name + str(i) + '.'))
                i += 1
        else:
            out[name[:-1]] = x

    return out

Performance Tips

Here are some performance tips for flattening JSON data in Python:

  1. Use an iterative approach: Instead of using recursion, use an iterative approach with a stack to avoid exceeding Python's maximum recursion depth.
  2. Use a dictionary comprehension: Instead of using a for loop to iterate over the dictionary keys, use a dictionary comprehension to create the flattened dictionary.
  3. Avoid unnecessary memory allocations: Avoid creating unnecessary memory allocations by using a single dictionary to store the flattened data instead of creating multiple dictionaries.

FAQ

Q: What is the maximum recursion depth in Python?

A: The maximum recursion depth in Python is 1000 by default, but it can be increased by setting the sys.setrecursionlimit function.

Q: How do I handle Unicode characters in JSON data?

A: You can handle Unicode characters in JSON data by using the unicode function to encode the characters correctly.

Q: How do I handle large JSON objects?

A: You can handle large JSON objects by using an iterative approach with a stack instead of recursion.

Q: What is the best way to flatten JSON data in Python?

A: The best way to flatten JSON data in Python is to use an iterative approach with a dictionary comprehension.

Q: Can I use this code to flatten JSON data from a file?

A: Yes, you can use this code to flatten JSON data from a file by reading the file contents into a string and passing it to the flatten_json function.

AI agent tools available. The CodeTidy MCP Server gives Claude, Cursor, and other AI agents access to 60+ developer tools. One command: npx @codetidy/mcp