Hash functions are fundamental to modern computing—from verifying file downloads to securing passwords. But what exactly is a hash, and how do you choose the right one?
What is a Hash Function?
A hash function takes an input of any size and produces a fixed-size output called a hash, digest, or checksum. The same input always produces the same output.
"Hello" → 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
"Hello!" → 33072bed7b92f47f3c64e586526c0fce2e5a3ecf3efdb4fde82c26cc4d08efbd
"hello" → 5d41402abc4b2a76b9719d911017c592f2b161e5c1fa7425e73043362938b9824
Notice how even tiny changes ("Hello" vs "hello") produce completely different hashes. This is called the avalanche effect.
Properties of Cryptographic Hashes
Good cryptographic hash functions have these properties:
1. Deterministic
Same input → same output, always.
2. Fast to compute
Hashing should be quick (unless you're hashing passwords—more on that later).
3. Pre-image resistance
Given a hash, it should be computationally infeasible to find the original input.
4. Second pre-image resistance
Given an input, it should be infeasible to find a different input with the same hash.
5. Collision resistance
It should be infeasible to find any two different inputs that produce the same hash.
6. Avalanche effect
Small changes in input create dramatically different outputs.
MD5: Still Useful, But Not for Security
MD5 (Message Digest 5) was designed in 1991 and produces a 128-bit (32 hex character) hash.
MD5("Hello World") = b10a8db164e0754105b7a99be72e3fe5
MD5's Problems
MD5 is cryptographically broken:
- Collision attacks discovered in 2004
- Now possible to create different files with the same MD5 hash
- In 2012, the Flame malware used MD5 collisions to impersonate Microsoft
When MD5 is Still OK
- Checksums for data integrity (accidental corruption, not malicious)
- Cache keys (non-security purposes)
- Legacy systems (when you can't upgrade)
When to Avoid MD5
- Password hashing
- Digital signatures
- Any security-critical application
- Verifying file authenticity (use SHA-256)
SHA-1: Deprecated But Everywhere
SHA-1 (Secure Hash Algorithm 1) produces a 160-bit (40 hex character) hash.
SHA-1("Hello World") = 0a4d55a8d778e5022fab701977c5d840bbc486d0
SHA-1's Status
SHA-1 is deprecated for security use:
- Theoretical attacks since 2005
- First practical collision demonstrated in 2017 (SHAttered attack by Google)
- Still used in Git (moving to SHA-256)
- Browsers reject SHA-1 certificates since 2017
Where You Still See SHA-1
- Git commit hashes (legacy, migration ongoing)
- Some older software verification
- Internal non-security checksums
SHA-256: The Current Standard
SHA-256 is part of the SHA-2 family, producing a 256-bit (64 hex character) hash.
SHA-256("Hello World") = a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e
Why SHA-256 is Recommended
- No known practical attacks
- Standard for TLS/SSL certificates
- Used in Bitcoin and blockchain
- Recommended by NIST for security applications
SHA-256 in Practice
File verification:
sha256sum ubuntu-22.04.iso
# Compare with official checksum
Digital signatures: Most code signing and document signing uses SHA-256.
Blockchain: Bitcoin mining involves finding SHA-256 hashes with specific patterns.
SHA-512: When You Need More
SHA-512 produces a 512-bit (128 hex character) hash.
SHA-512("Hello World") = 2c74fd17edafd80e8447b0d46741ee243b7eb74dd2149a0ab1b9246fb30382f27e853d8585719e0e67cbda0daa8f51671064615d645ae27acb15bfb1447f459b
SHA-512 vs SHA-256
| Property | SHA-256 | SHA-512 |
|---|---|---|
| Output size | 256 bits | 512 bits |
| Block size | 512 bits | 1024 bits |
| Speed (64-bit) | Good | Better |
| Speed (32-bit) | Better | Slower |
| Security margin | High | Higher |
SHA-512 is often faster on 64-bit processors because it works with 64-bit words. Use it when you need extra security margin or when working on 64-bit systems.
Hash Use Cases
Password Storage
Never store plain passwords!
# WRONG - Don't do this
password_hash = hashlib.sha256(password.encode()).hexdigest()
# RIGHT - Use a password-specific algorithm
password_hash = bcrypt.hashpw(password.encode(), bcrypt.gensalt())
Standard hashes (MD5, SHA) are too fast for passwords—use bcrypt, Argon2, or scrypt instead.
File Integrity
Verify downloads haven't been corrupted or tampered with:
# Download file
wget https://example.com/software.zip
# Verify checksum
echo "expected_hash software.zip" | sha256sum --check
Digital Signatures
Sign a document:
- Hash the document (SHA-256)
- Encrypt the hash with private key
- Recipient decrypts with public key and compares hashes
Deduplication
Store files by their hash to avoid duplicates:
file_hash = sha256(file_content).hexdigest()
storage_path = f"/data/{file_hash}"
Git Commits
Git identifies commits, trees, and blobs by SHA-1 hash:
commit a1b2c3d4e5f6...
Caching
Use hashes as cache keys:
cache_key = sha256(json.dumps(query_params)).hexdigest()
cached_result = cache.get(cache_key)
Hash Collisions Explained
A collision occurs when two different inputs produce the same hash. With finite output size, collisions are mathematically inevitable (pigeonhole principle).
Birthday attack: Finding a collision is easier than finding a specific pre-image. For a hash with n-bit output:
- Pre-image attack: ~2^n attempts
- Collision attack: ~2^(n/2) attempts (birthday paradox)
This is why MD5 (128-bit) is broken—2^64 operations is feasible today.
Salting Hashes
A salt is random data added to input before hashing:
Without salt:
hash("password") = 5f4dcc3b5aa765d61d8327deb882cf99 (same for everyone)
With salt:
hash("password" + "random123") = 7c6a180b36896a65c3ff4ebf8
hash("password" + "xyz789abc") = 9f8b2d5a1c4e7f3b6a8d2c1e4
Salting prevents:
- Rainbow table attacks (pre-computed hash lookups)
- Identifying users with same password
Choosing the Right Hash
| Use Case | Recommended Hash |
|---|---|
| Passwords | bcrypt, Argon2, scrypt (NOT SHA/MD5) |
| File integrity | SHA-256 |
| Digital signatures | SHA-256 or SHA-512 |
| Checksums (non-security) | SHA-256 (or MD5 if legacy) |
| HMAC | SHA-256 or SHA-512 |
| New systems | SHA-256 minimum |
Summary
- MD5: Fast, 128-bit, broken for security, OK for checksums
- SHA-1: 160-bit, deprecated, avoid for new projects
- SHA-256: 256-bit, current standard, use this
- SHA-512: 512-bit, faster on 64-bit systems, extra security
For new projects, default to SHA-256. For passwords, use specialized algorithms like bcrypt or Argon2.
Need to generate hashes? Try our MD5 Generator, SHA-256 Generator, or HMAC Generator!