Unicode: One Standard to Rule Them All

  • Before Unicode: Chaos—ASCII (128 chars), Latin-1 (256), Shift-JIS (Japanese), Big5 (Chinese)... files were garbled between systems
  • The fix: Unicode Consortium (1991) created one encoding for every character in every language
  • Today: 149,813 characters covering 161 scripts, from English to Emoji to Egyptian Hieroglyphs

Unicode by the Numbers

  • Total possible code points: 1,114,112 (0x000000 to 0x10FFFF)
  • Currently assigned: ~149,000 (about 13% used)
  • New additions yearly: ~700-1,000 characters per version
  • Emoji alone: 3,782 (as of Unicode 15.1)
  • CJK ideographs: 97,680+ (Chinese/Japanese/Korean characters)
  • Private use area: 6,400 slots for custom characters

Famous Unicode Characters

  • U+0000: NULL—the very first code point (inherited from ASCII)
  • U+FFFD: �—the 'replacement character' shown when decoding fails
  • U+200B: Zero-width space—invisible but exists (copy this: '​' ← it's there!)
  • U+1F4A9: 💩—the pile of poo emoji, hotly debated before approval
  • U+2603: ☃—the snowman, used to test Unicode support

Useful Symbols for Developers

  • → ← ↑ ↓ (arrows): Great for CLI tools and documentation
  • ✓ ✗ (check/cross): Status indicators without images
  • • ◦ ‣ (bullets): List styling in plain text
  • │ ├ └ ─ (box drawing): ASCII art tables and trees
  • ∞ ≈ ≠ ≤ ≥ (math): Technical documentation
  • © ® ™ (legal): Trademark and copyright notices

Unicode Security Gotchas

  • Homoglyphs: раypal.com ← That 'a' is Cyrillic! Used in phishing attacks
  • Invisible chars: U+200B (zero-width space) can hide in filenames and URLs
  • Right-to-left: U+202E reverses text direction—filename 'photo[RLO]gnp.exe' appears as 'photoexe.png'
  • Normalization: é can be 1 char or 2 (e + combining accent)—causes string comparison bugs

Emoji Evolution

  • 1999: First emoji created by Shigetaka Kurita at NTT DoCoMo (Japan)
  • 2010: Unicode 6.0 officially added emoji—722 characters
  • 2015: Skin tone modifiers added (Fitzpatrick scale)
  • 2019: Emoji 12.0 added gender-neutral options
  • Most used emoji: 😂 (Face with Tears of Joy)—15% of all emoji usage
  • Newest additions: Shaking face, pink heart, moose, and more (Unicode 15.0)

How to Type Unicode

  • Windows: Alt + numpad code (Alt+0169 = ©) or Win+. for emoji picker
  • Mac: Ctrl+Cmd+Space for emoji picker, or enable Unicode Hex Input keyboard
  • Linux: Ctrl+Shift+U then hex code (Ctrl+Shift+U 00A9 = ©)
  • HTML: © or © or © (all render as ©)
  • JavaScript: '\u00A9' or '\u{1F600}' for emoji (ES6+ required for latter)

Fun Unicode Blocks to Explore

  • Dingbats (U+2700): ✂ ✈ ✉ ✎—classic symbols from Zapf Dingbats font
  • Box Drawing (U+2500): Create ASCII art tables and UI borders
  • Miscellaneous Symbols (U+2600): ☀ ☁ ☂ ☃ ⚡—weather and more
  • Mathematical Operators (U+2200): ∀ ∃ ∈ ∑ ∏ √—for math docs
  • Playing Cards (U+1F0A0): 🂡 🂱 🃁 🃑—full deck of cards!