CSV (Comma-Separated Values) is one of the most universally supported data formats. Despite being decades old and remarkably simple, it's still everywhere—from database exports to configuration files. Here's everything you need to know about working with CSV files.
What Is CSV?
CSV is a plain text format where:
- Each line represents one record (row)
- Fields within a record are separated by delimiters (usually commas)
- The first line often contains headers (column names)
Basic Example
name,email,age
John Doe,john@example.com,30
Jane Smith,jane@example.com,25
Bob Wilson,bob@example.com,35
When viewed in a spreadsheet:
| name | age | |
|---|---|---|
| John Doe | john@example.com | 30 |
| Jane Smith | jane@example.com | 25 |
| Bob Wilson | bob@example.com | 35 |
Why CSV Is Popular
- Universal: Every spreadsheet program can open it
- Simple: Plain text, human-readable
- Lightweight: No formatting overhead
- Portable: Works on any operating system
- Easy to generate: Simple string concatenation
The Tricky Parts
CSV looks simple, but edge cases can cause problems.
Problem 1: Commas in Data
What if a field contains a comma?
name,description,price
Widget,"A small, useful gadget",9.99
Solution: Wrap fields containing commas in quotes.
Problem 2: Quotes in Data
What if a field contains quotes?
title,quote
Interview,"She said ""Hello"" to everyone"
Solution: Double the quotes inside quoted fields ("" represents a single ").
Problem 3: Newlines in Data
What if a field spans multiple lines?
name,address
"John Doe","123 Main Street
Apt 4B
New York, NY 10001"
Solution: Wrap in quotes—the entire multi-line content is one field.
Problem 4: Different Delimiters
In some locales, commas are decimal separators. These files use semicolons:
name;price;quantity
Widget;9,99;100
Gadget;14,50;50
This is sometimes called "semicolon-separated values" but still uses .csv extension.
Proper CSV Quoting Rules
According to RFC 4180 (the closest thing to a CSV standard):
- Fields MAY be enclosed in quotes
- Fields containing commas, newlines, or quotes MUST be quoted
- Quotes inside quoted fields must be doubled
Safe vs Unsafe Fields
Safe,Needs Quotes
hello,"hello, world"
simple,"has ""quotes"""
one line,"line 1
line 2"
Character Encoding
CSV files don't specify encoding—this causes many problems.
Common Encodings
| Encoding | Best For | Notes |
|---|---|---|
| UTF-8 | Most cases | Universal support, recommended |
| UTF-8 with BOM | Excel on Windows | EF BB BF prefix helps Excel |
| Latin-1 (ISO-8859-1) | Legacy Western | Limited characters |
| Windows-1252 | Old Windows files | Similar to Latin-1 |
The Excel UTF-8 Problem
Excel sometimes mangles UTF-8 files. To fix:
Option 1: Add UTF-8 BOM:
with open('file.csv', 'w', encoding='utf-8-sig') as f:
# utf-8-sig adds the BOM automatically
writer = csv.writer(f)
writer.writerows(data)
Option 2: Use Excel's import wizard instead of double-clicking
Line Endings
Different operating systems use different line endings:
- Unix/Linux/Mac:
\n(LF) - Windows:
\r\n(CRLF) - Old Mac (pre-OS X):
\r(CR)
Most modern parsers handle all three, but generating files? Use the target platform's convention.
Working with CSV in Code
JavaScript
// Simple parsing (no quoted fields)
const rows = csvText.split('\n').map(row => row.split(','));
// Better: Use a library like PapaParse
Papa.parse(csvText, {
header: true,
complete: function(results) {
console.log(results.data);
}
});
// Generating CSV
function toCSV(data) {
const escape = (val) => {
if (val == null) return '';
const str = String(val);
if (str.includes(',') || str.includes('"') || str.includes('\n')) {
return `"${str.replace(/"/g, '""')}"`;
}
return str;
};
return data.map(row => row.map(escape).join(',')).join('\n');
}
Python
import csv
# Reading
with open('file.csv', 'r', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
print(row['name'], row['email'])
# Writing
with open('output.csv', 'w', encoding='utf-8', newline='') as f:
writer = csv.writer(f)
writer.writerow(['name', 'email', 'age']) # header
writer.writerow(['John', 'john@example.com', 30])
writer.writerow(['Jane', 'jane@example.com', 25])
C#
using System.Globalization;
using CsvHelper;
// Reading
using var reader = new StreamReader("file.csv");
using var csv = new CsvReader(reader, CultureInfo.InvariantCulture);
var records = csv.GetRecords<Person>().ToList();
// Writing
using var writer = new StreamWriter("output.csv");
using var csv = new CsvWriter(writer, CultureInfo.InvariantCulture);
csv.WriteRecords(people);
CSV Best Practices
Creating CSV Files
- Always include a header row
- Use UTF-8 encoding (with BOM for Excel)
- Quote fields containing special characters
- Be consistent with delimiters
- Use ISO 8601 for dates (
2024-01-15) - Avoid trailing commas
Parsing CSV Files
- Use a library—don't write your own parser
- Specify encoding explicitly
- Handle different line endings
- Validate data types after parsing
- Trim whitespace (or configure parser to do so)
CSV vs Other Formats
| Feature | CSV | JSON | XML | Excel |
|---|---|---|---|---|
| Human readable | ✅ | ✅ | ✅ | ❌ |
| Structured | ❌ | ✅ | ✅ | ✅ |
| Data types | ❌ | ✅ | ✅ | ✅ |
| Nested data | ❌ | ✅ | ✅ | ❌ |
| Universal support | ✅ | ✅ | ✅ | ⚠️ |
| File size | Small | Medium | Large | Medium |
| Easy to generate | ✅ | ✅ | ❌ | ❌ |
When to Use CSV
- ✅ Tabular data with simple types
- ✅ Import/export with spreadsheets
- ✅ Large datasets (smaller than JSON/XML)
- ✅ Data exchange between systems
When NOT to Use CSV
- ❌ Nested/hierarchical data
- ❌ Complex data types
- ❌ Configuration files (use JSON/YAML)
- ❌ When you need a schema
Common Gotchas
Numbers as Text
Excel may interpret numbers as text or vice versa:
zipcode,phone
02134,555-1234
The zipcode 02134 might become 2134. Force text with quotes:
zipcode,phone
"02134","555-1234"
Scientific Notation
Large numbers get converted to scientific notation:
123456789012becomes1.23457E+11
Dates
Dates are ambiguous. Is 01/02/03:
- January 2, 2003? (US)
- February 1, 2003? (UK)
- 2001 February 3? (ISO... sort of)
Always use ISO 8601: 2003-01-02
Empty vs Null
name,email,phone
John,,555-1234
Is the email empty string or null? CSV can't distinguish. Define conventions in your documentation.
Summary
CSV's simplicity is both its strength and weakness:
- Use quotes for fields with commas, quotes, or newlines
- Use UTF-8 encoding (with BOM for Excel compatibility)
- Use a library for parsing—edge cases are tricky
- Use ISO 8601 for dates
- Document your conventions for delimiters, encoding, and null values
Despite its age, CSV remains the lingua franca of data exchange. Understanding its quirks helps you avoid common pitfalls.
Need to validate or convert CSV data? Try our CSV Parser or JSON-CSV Converter!