How to Convert CSV to JSON in Python

====================================================================

Converting CSV (Comma Separated Values) to JSON (JavaScript Object Notation) is a common data transformation task in data processing and analysis. CSV is a widely used format for tabular data, while JSON is a lightweight data interchange format that is easily readable by both humans and machines. In this guide, we will walk through the process of converting CSV to JSON in Python, covering the most common use case, edge cases, common mistakes, and performance tips.

Quick Example

Here is a minimal example that converts a CSV file to a JSON file using the csv and json modules:

import csv
import json

# Define the input and output file paths
input_file = 'input.csv'
output_file = 'output.json'

# Read the CSV file
with open(input_file, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    data = [row for row in reader]

# Write the JSON file
with open(output_file, 'w') as jsonfile:
    json.dump(data, jsonfile, indent=4)

This code assumes that the CSV file has a header row with column names.

Step-by-Step Breakdown

Let's walk through the code line by line:

import csv and import json: We import the csv and json modules, which provide functions for reading and writing CSV and JSON files, respectively.
input_file = 'input.csv' and output_file = 'output.json': We define the input and output file paths as strings.
with open(input_file, 'r') as csvfile:: We open the input CSV file in read mode ('r') using a with statement, which ensures that the file is properly closed when we're done with it.
reader = csv.DictReader(csvfile): We create a DictReader object to read the CSV file. This object returns a dictionary for each row, where the keys are the column names and the values are the row values.
data = [row for row in reader]: We read the entire CSV file into a list of dictionaries using a list comprehension.
with open(output_file, 'w') as jsonfile:: We open the output JSON file in write mode ('w') using a with statement.
json.dump(data, jsonfile, indent=4): We write the list of dictionaries to the JSON file using the dump function from the json module. We pass indent=4 to pretty-print the JSON output with 4-space indentation.

Handling Edge Cases

Here are some common edge cases to consider:

Empty/Null Input

If the input CSV file is empty or null, the DictReader object will raise a StopIteration exception. We can handle this case by checking if the input file is empty before trying to read it:

import os

if os.path.getsize(input_file) == 0:
    print("Input file is empty")
    exit(1)

Invalid Input

If the input CSV file is malformed or contains invalid data, the DictReader object may raise a csv.Error exception. We can handle this case by wrapping the DictReader object in a try-except block:

try:
    reader = csv.DictReader(csvfile)
except csv.Error as e:
    print(f"Error reading CSV file: {e}")
    exit(1)

Large Input

If the input CSV file is very large, reading the entire file into memory may not be feasible. In this case, we can use a streaming approach to read the file in chunks:

import csv

chunk_size = 1000
with open(input_file, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    chunks = []
    for i, row in enumerate(reader):
        if i % chunk_size == 0:
            chunks.append([])
        chunks[-1].append(row)

Unicode/Special Characters

If the input CSV file contains Unicode or special characters, we may need to specify the encoding when opening the file:

with open(input_file, 'r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)

Common Mistakes

Here are some common mistakes to avoid:

Mistake 1: Not Handling Edge Cases

# Wrong
with open(input_file, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    data = [row for row in reader]

# Correct
try:
    with open(input_file, 'r') as csvfile:
        reader = csv.DictReader(csvfile)
        data = [row for row in reader]
except csv.Error as e:
    print(f"Error reading CSV file: {e}")
    exit(1)

Mistake 2: Not Specifying Encoding

# Wrong
with open(input_file, 'r') as csvfile:
    reader = csv.DictReader(csvfile)

# Correct
with open(input_file, 'r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)

Mistake 3: Not Handling Large Input

# Wrong
with open(input_file, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    data = [row for row in reader]

# Correct
chunk_size = 1000
with open(input_file, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    chunks = []
    for i, row in enumerate(reader):
        if i % chunk_size == 0:
            chunks.append([])
        chunks[-1].append(row)

Performance Tips

Here are some performance tips to keep in mind:

Use csv.DictReader instead of csv.reader: DictReader returns a dictionary for each row, which is more convenient to work with than a list of values.
Use json.dump instead of json.dumps: json.dump writes the JSON data directly to a file, which is faster than serializing the data to a string with json.dumps.
Use with statements: with statements ensure that files are properly closed when we're done with them, which helps prevent file descriptor leaks.

FAQ

Q: What is the difference between `csv.reader` and `csv.DictReader`?

A: csv.reader returns a list of values for each row, while csv.DictReader returns a dictionary with column names as keys and row values as values.

Q: How do I handle large input CSV files?

A: You can use a streaming approach to read the file in chunks, or use a library like pandas that supports reading large files in chunks.

Q: What is the best way to handle Unicode characters in CSV files?

A: You can specify the encoding when opening the file, such as encoding='utf-8'.

Q: How do I pretty-print JSON output?

A: You can pass indent=4 to the json.dump function to pretty-print the JSON output with 4-space indentation.

Q: What is the difference between `json.dump` and `json.dumps`?

A: json.dump writes the JSON data directly to a file, while json.dumps serializes the data to a string.

How to Convert CSV to JSON in Python

How to Convert CSV to JSON in Python

Quick Example

Step-by-Step Breakdown

Handling Edge Cases

Empty/Null Input

Invalid Input

Large Input

Unicode/Special Characters

Common Mistakes

Mistake 1: Not Handling Edge Cases

Mistake 2: Not Specifying Encoding

Mistake 3: Not Handling Large Input

Performance Tips

FAQ

Q: What is the difference between csv.reader and csv.DictReader?

Q: How do I handle large input CSV files?

Q: What is the best way to handle Unicode characters in CSV files?

Q: How do I pretty-print JSON output?

Q: What is the difference between json.dump and json.dumps?

Related Resources

Json To Csv

More Json To Csv Examples

All Code Examples

All Developer Tools

Q: What is the difference between `csv.reader` and `csv.DictReader`?

Q: What is the difference between `json.dump` and `json.dumps`?