Data & Files

CSV Files: The Simple Data Format Everyone Uses

Understanding CSV files, their structure, common pitfalls, and best practices for creating and parsing comma-separated values.

HandyUtils December 26, 2025 6 min read

CSV (Comma-Separated Values) is one of the most universally supported data formats. Despite being decades old and remarkably simple, it's still everywhere—from database exports to configuration files. Here's everything you need to know about working with CSV files.

What Is CSV?

CSV is a plain text format where:

  • Each line represents one record (row)
  • Fields within a record are separated by delimiters (usually commas)
  • The first line often contains headers (column names)

Basic Example

name,email,age
John Doe,john@example.com,30
Jane Smith,jane@example.com,25
Bob Wilson,bob@example.com,35

When viewed in a spreadsheet:

name email age
John Doe john@example.com 30
Jane Smith jane@example.com 25
Bob Wilson bob@example.com 35
  • Universal: Every spreadsheet program can open it
  • Simple: Plain text, human-readable
  • Lightweight: No formatting overhead
  • Portable: Works on any operating system
  • Easy to generate: Simple string concatenation

The Tricky Parts

CSV looks simple, but edge cases can cause problems.

Problem 1: Commas in Data

What if a field contains a comma?

name,description,price
Widget,"A small, useful gadget",9.99

Solution: Wrap fields containing commas in quotes.

Problem 2: Quotes in Data

What if a field contains quotes?

title,quote
Interview,"She said ""Hello"" to everyone"

Solution: Double the quotes inside quoted fields ("" represents a single ").

Problem 3: Newlines in Data

What if a field spans multiple lines?

name,address
"John Doe","123 Main Street
Apt 4B
New York, NY 10001"

Solution: Wrap in quotes—the entire multi-line content is one field.

Problem 4: Different Delimiters

In some locales, commas are decimal separators. These files use semicolons:

name;price;quantity
Widget;9,99;100
Gadget;14,50;50

This is sometimes called "semicolon-separated values" but still uses .csv extension.

Proper CSV Quoting Rules

According to RFC 4180 (the closest thing to a CSV standard):

  1. Fields MAY be enclosed in quotes
  2. Fields containing commas, newlines, or quotes MUST be quoted
  3. Quotes inside quoted fields must be doubled

Safe vs Unsafe Fields

Safe,Needs Quotes
hello,"hello, world"
simple,"has ""quotes"""
one line,"line 1
line 2"

Character Encoding

CSV files don't specify encoding—this causes many problems.

Common Encodings

Encoding Best For Notes
UTF-8 Most cases Universal support, recommended
UTF-8 with BOM Excel on Windows EF BB BF prefix helps Excel
Latin-1 (ISO-8859-1) Legacy Western Limited characters
Windows-1252 Old Windows files Similar to Latin-1

The Excel UTF-8 Problem

Excel sometimes mangles UTF-8 files. To fix:

Option 1: Add UTF-8 BOM:

with open('file.csv', 'w', encoding='utf-8-sig') as f:
    # utf-8-sig adds the BOM automatically
    writer = csv.writer(f)
    writer.writerows(data)

Option 2: Use Excel's import wizard instead of double-clicking

Line Endings

Different operating systems use different line endings:

  • Unix/Linux/Mac: \n (LF)
  • Windows: \r\n (CRLF)
  • Old Mac (pre-OS X): \r (CR)

Most modern parsers handle all three, but generating files? Use the target platform's convention.

Working with CSV in Code

JavaScript

// Simple parsing (no quoted fields)
const rows = csvText.split('\n').map(row => row.split(','));

// Better: Use a library like PapaParse
Papa.parse(csvText, {
    header: true,
    complete: function(results) {
        console.log(results.data);
    }
});

// Generating CSV
function toCSV(data) {
    const escape = (val) => {
        if (val == null) return '';
        const str = String(val);
        if (str.includes(',') || str.includes('"') || str.includes('\n')) {
            return `"${str.replace(/"/g, '""')}"`;
        }
        return str;
    };
    
    return data.map(row => row.map(escape).join(',')).join('\n');
}

Python

import csv

# Reading
with open('file.csv', 'r', encoding='utf-8') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row['name'], row['email'])

# Writing
with open('output.csv', 'w', encoding='utf-8', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['name', 'email', 'age'])  # header
    writer.writerow(['John', 'john@example.com', 30])
    writer.writerow(['Jane', 'jane@example.com', 25])

C#

using System.Globalization;
using CsvHelper;

// Reading
using var reader = new StreamReader("file.csv");
using var csv = new CsvReader(reader, CultureInfo.InvariantCulture);
var records = csv.GetRecords<Person>().ToList();

// Writing
using var writer = new StreamWriter("output.csv");
using var csv = new CsvWriter(writer, CultureInfo.InvariantCulture);
csv.WriteRecords(people);

CSV Best Practices

Creating CSV Files

  1. Always include a header row
  2. Use UTF-8 encoding (with BOM for Excel)
  3. Quote fields containing special characters
  4. Be consistent with delimiters
  5. Use ISO 8601 for dates (2024-01-15)
  6. Avoid trailing commas

Parsing CSV Files

  1. Use a library—don't write your own parser
  2. Specify encoding explicitly
  3. Handle different line endings
  4. Validate data types after parsing
  5. Trim whitespace (or configure parser to do so)

CSV vs Other Formats

Feature CSV JSON XML Excel
Human readable
Structured
Data types
Nested data
Universal support ⚠️
File size Small Medium Large Medium
Easy to generate

When to Use CSV

  • ✅ Tabular data with simple types
  • ✅ Import/export with spreadsheets
  • ✅ Large datasets (smaller than JSON/XML)
  • ✅ Data exchange between systems

When NOT to Use CSV

  • ❌ Nested/hierarchical data
  • ❌ Complex data types
  • ❌ Configuration files (use JSON/YAML)
  • ❌ When you need a schema

Common Gotchas

Numbers as Text

Excel may interpret numbers as text or vice versa:

zipcode,phone
02134,555-1234

The zipcode 02134 might become 2134. Force text with quotes:

zipcode,phone
"02134","555-1234"

Scientific Notation

Large numbers get converted to scientific notation:

  • 123456789012 becomes 1.23457E+11

Dates

Dates are ambiguous. Is 01/02/03:

  • January 2, 2003? (US)
  • February 1, 2003? (UK)
  • 2001 February 3? (ISO... sort of)

Always use ISO 8601: 2003-01-02

Empty vs Null

name,email,phone
John,,555-1234

Is the email empty string or null? CSV can't distinguish. Define conventions in your documentation.

Summary

CSV's simplicity is both its strength and weakness:

  1. Use quotes for fields with commas, quotes, or newlines
  2. Use UTF-8 encoding (with BOM for Excel compatibility)
  3. Use a library for parsing—edge cases are tricky
  4. Use ISO 8601 for dates
  5. Document your conventions for delimiters, encoding, and null values

Despite its age, CSV remains the lingua franca of data exchange. Understanding its quirks helps you avoid common pitfalls.

Need to validate or convert CSV data? Try our CSV Parser or JSON-CSV Converter!

Related Topics
csv comma-separated values data format spreadsheet import export parsing
Share this article

Continue Reading

Data Formats
JSON Explained: The Universal Data Format

Everything you need to know about JSON: syntax rules, data types, common errors, and why JSON became the web's favorite data format.

Data & Files
Understanding File Sizes: Bits, Bytes, and Beyond

A complete guide to digital storage units from bits to terabytes, including the confusing differences between MB and MiB.