Unicode: One Standard to Rule Them All
- Before Unicode: Chaos—ASCII (128 chars), Latin-1 (256), Shift-JIS (Japanese), Big5 (Chinese)... files were garbled between systems
- The fix: Unicode Consortium (1991) created one encoding for every character in every language
- Today: 149,813 characters covering 161 scripts, from English to Emoji to Egyptian Hieroglyphs
Unicode by the Numbers
- Total possible code points: 1,114,112 (0x000000 to 0x10FFFF)
- Currently assigned: ~149,000 (about 13% used)
- New additions yearly: ~700-1,000 characters per version
- Emoji alone: 3,782 (as of Unicode 15.1)
- CJK ideographs: 97,680+ (Chinese/Japanese/Korean characters)
- Private use area: 6,400 slots for custom characters
Famous Unicode Characters
- U+0000: NULL—the very first code point (inherited from ASCII)
- U+FFFD: �—the 'replacement character' shown when decoding fails
- U+200B: Zero-width space—invisible but exists (copy this: '' ← it's there!)
- U+1F4A9: 💩—the pile of poo emoji, hotly debated before approval
- U+2603: ☃—the snowman, used to test Unicode support
Useful Symbols for Developers
- → ← ↑ ↓ (arrows): Great for CLI tools and documentation
- ✓ ✗ (check/cross): Status indicators without images
- • ◦ ‣ (bullets): List styling in plain text
- │ ├ └ ─ (box drawing): ASCII art tables and trees
- ∞ ≈ ≠ ≤ ≥ (math): Technical documentation
- © ® ™ (legal): Trademark and copyright notices
Unicode Security Gotchas
- Homoglyphs: раypal.com ← That 'a' is Cyrillic! Used in phishing attacks
- Invisible chars: U+200B (zero-width space) can hide in filenames and URLs
- Right-to-left: U+202E reverses text direction—filename 'photo[RLO]gnp.exe' appears as 'photoexe.png'
- Normalization: é can be 1 char or 2 (e + combining accent)—causes string comparison bugs
Emoji Evolution
- 1999: First emoji created by Shigetaka Kurita at NTT DoCoMo (Japan)
- 2010: Unicode 6.0 officially added emoji—722 characters
- 2015: Skin tone modifiers added (Fitzpatrick scale)
- 2019: Emoji 12.0 added gender-neutral options
- Most used emoji: 😂 (Face with Tears of Joy)—15% of all emoji usage
- Newest additions: Shaking face, pink heart, moose, and more (Unicode 15.0)
How to Type Unicode
- Windows: Alt + numpad code (Alt+0169 = ©) or Win+. for emoji picker
- Mac: Ctrl+Cmd+Space for emoji picker, or enable Unicode Hex Input keyboard
- Linux: Ctrl+Shift+U then hex code (Ctrl+Shift+U 00A9 = ©)
- HTML: © or © or © (all render as ©)
- JavaScript: '\u00A9' or '\u{1F600}' for emoji (ES6+ required for latter)
Fun Unicode Blocks to Explore
- Dingbats (U+2700): ✂ ✈ ✉ ✎—classic symbols from Zapf Dingbats font
- Box Drawing (U+2500): Create ASCII art tables and UI borders
- Miscellaneous Symbols (U+2600): ☀ ☁ ☂ ☃ ⚡—weather and more
- Mathematical Operators (U+2200): ∀ ∃ ∈ ∑ ∏ √—for math docs
- Playing Cards (U+1F0A0): 🂡 🂱 🃁 🃑—full deck of cards!