Try it yourself with our free Json To Csv tool — runs entirely in your browser, no signup needed.

How to Convert CSV to JSON in Python

How to Convert CSV to JSON in Python

====================================================================

Converting CSV (Comma Separated Values) to JSON (JavaScript Object Notation) is a common data transformation task in data processing and analysis. CSV is a widely used format for tabular data, while JSON is a lightweight data interchange format that is easily readable by both humans and machines. In this guide, we will walk through the process of converting CSV to JSON in Python, covering the most common use case, edge cases, common mistakes, and performance tips.

Quick Example


Here is a minimal example that converts a CSV file to a JSON file using the csv and json modules:

import csv
import json

# Define the input and output file paths
input_file = 'input.csv'
output_file = 'output.json'

# Read the CSV file
with open(input_file, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    data = [row for row in reader]

# Write the JSON file
with open(output_file, 'w') as jsonfile:
    json.dump(data, jsonfile, indent=4)

This code assumes that the CSV file has a header row with column names.

Step-by-Step Breakdown


Let's walk through the code line by line:

  1. import csv and import json: We import the csv and json modules, which provide functions for reading and writing CSV and JSON files, respectively.
  2. input_file = 'input.csv' and output_file = 'output.json': We define the input and output file paths as strings.
  3. with open(input_file, 'r') as csvfile:: We open the input CSV file in read mode ('r') using a with statement, which ensures that the file is properly closed when we're done with it.
  4. reader = csv.DictReader(csvfile): We create a DictReader object to read the CSV file. This object returns a dictionary for each row, where the keys are the column names and the values are the row values.
  5. data = [row for row in reader]: We read the entire CSV file into a list of dictionaries using a list comprehension.
  6. with open(output_file, 'w') as jsonfile:: We open the output JSON file in write mode ('w') using a with statement.
  7. json.dump(data, jsonfile, indent=4): We write the list of dictionaries to the JSON file using the dump function from the json module. We pass indent=4 to pretty-print the JSON output with 4-space indentation.

Handling Edge Cases


Here are some common edge cases to consider:

Empty/Null Input

If the input CSV file is empty or null, the DictReader object will raise a StopIteration exception. We can handle this case by checking if the input file is empty before trying to read it:

import os

if os.path.getsize(input_file) == 0:
    print("Input file is empty")
    exit(1)

Invalid Input

If the input CSV file is malformed or contains invalid data, the DictReader object may raise a csv.Error exception. We can handle this case by wrapping the DictReader object in a try-except block:

try:
    reader = csv.DictReader(csvfile)
except csv.Error as e:
    print(f"Error reading CSV file: {e}")
    exit(1)

Large Input

If the input CSV file is very large, reading the entire file into memory may not be feasible. In this case, we can use a streaming approach to read the file in chunks:

import csv

chunk_size = 1000
with open(input_file, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    chunks = []
    for i, row in enumerate(reader):
        if i % chunk_size == 0:
            chunks.append([])
        chunks[-1].append(row)

Unicode/Special Characters

If the input CSV file contains Unicode or special characters, we may need to specify the encoding when opening the file:

with open(input_file, 'r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)

Common Mistakes


Here are some common mistakes to avoid:

Mistake 1: Not Handling Edge Cases

# Wrong
with open(input_file, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    data = [row for row in reader]

# Correct
try:
    with open(input_file, 'r') as csvfile:
        reader = csv.DictReader(csvfile)
        data = [row for row in reader]
except csv.Error as e:
    print(f"Error reading CSV file: {e}")
    exit(1)

Mistake 2: Not Specifying Encoding

# Wrong
with open(input_file, 'r') as csvfile:
    reader = csv.DictReader(csvfile)

# Correct
with open(input_file, 'r', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)

Mistake 3: Not Handling Large Input

# Wrong
with open(input_file, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    data = [row for row in reader]

# Correct
chunk_size = 1000
with open(input_file, 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    chunks = []
    for i, row in enumerate(reader):
        if i % chunk_size == 0:
            chunks.append([])
        chunks[-1].append(row)

Performance Tips


Here are some performance tips to keep in mind:

  1. Use csv.DictReader instead of csv.reader: DictReader returns a dictionary for each row, which is more convenient to work with than a list of values.
  2. Use json.dump instead of json.dumps: json.dump writes the JSON data directly to a file, which is faster than serializing the data to a string with json.dumps.
  3. Use with statements: with statements ensure that files are properly closed when we're done with them, which helps prevent file descriptor leaks.

FAQ


Q: What is the difference between csv.reader and csv.DictReader?

A: csv.reader returns a list of values for each row, while csv.DictReader returns a dictionary with column names as keys and row values as values.

Q: How do I handle large input CSV files?

A: You can use a streaming approach to read the file in chunks, or use a library like pandas that supports reading large files in chunks.

Q: What is the best way to handle Unicode characters in CSV files?

A: You can specify the encoding when opening the file, such as encoding='utf-8'.

Q: How do I pretty-print JSON output?

A: You can pass indent=4 to the json.dump function to pretty-print the JSON output with 4-space indentation.

Q: What is the difference between json.dump and json.dumps?

A: json.dump writes the JSON data directly to a file, while json.dumps serializes the data to a string.

AI agent tools available. The CodeTidy MCP Server gives Claude, Cursor, and other AI agents access to 60+ developer tools. One command: npx @codetidy/mcp