Why URL Encoding Exists

  • The problem: URLs can only contain ASCII letters, numbers, and a few special chars
  • The solution: Percent-encoding: unsafe chars become %XX (hex code)
  • Example: 'Hello World!' → 'Hello%20World%21' (%20=space, %21=!)

Common Encodings Cheat Sheet

  • Space: %20 (or + in query strings—both are valid)
  • &: %26 — separates query params, must encode if it's data
  • =: %3D — separates key=value, encode if it's in the value
  • ?: %3F — starts query string, encode if it's data
  • #: %23 — starts fragment, encode to include in data
  • /: %2F — path separator, encode to use in filenames
  • @: %40 — used in emails, encode for query params
  • +: %2B — represents space in forms, encode for literal +

encodeURI vs encodeURIComponent

  • encodeURI(): Encodes a FULL URL—preserves ://?#& so URL structure stays intact
  • encodeURIComponent(): Encodes a PART of URL—encodes everything except A-Z a-z 0-9 - _ . ~
  • This tool uses: encodeURIComponent() — safest for query parameter values
  • Common mistake: Using encodeURI() on query values leaves & and = unencoded, breaking URLs

Real-World Scenarios

  • Search queries: 'C++ tutorial' → 'C%2B%2B%20tutorial' (+ must be encoded!)
  • API params: 'filter=status=active' → 'filter=status%3Dactive'
  • Redirect URLs: 'redirect=https://example.com' → 'redirect=https%3A%2F%2Fexample.com'
  • Filenames in URLs: 'Report (Final).pdf' → 'Report%20%28Final%29.pdf'
  • Email in URL: 'email=user@domain.com' → 'email=user%40domain.com'

Double Encoding Trap

  • Already encoded '%20' becomes '%2520' if encoded again
  • Symptom: Spaces appear as '%20' in the UI instead of actual spaces
  • Fix: Decode first, then encode once—never chain multiple encodes
  • API debugging tip: If data looks weird, try decoding multiple times

Characters That DON'T Need Encoding

  • Letters: A-Z, a-z (unreserved, always safe)
  • Numbers: 0-9 (unreserved, always safe)
  • Special unreserved: - _ . ~ (safe in all URL parts)
  • Reserved but allowed: Depends on URL section (path vs query)

Unicode & International Characters

  • UTF-8 first: '日本' becomes bytes E6 97 A5 E6 9C AC
  • Then percent-encode: '日本' → '%E6%97%A5%E6%9C%AC'
  • Emoji work too: '👍' → '%F0%9F%91%8D' (4 bytes = 12 characters encoded)
  • Fun fact: Punycode is different—it's for domain names, not query strings

Security Considerations

  • URL encoding is NOT sanitization—XSS payloads survive encoding
  • Decode user input server-side before validation
  • Watch for path traversal: '../' encoded is '%2E%2E%2F'—still dangerous
  • Log decoded URLs: Encoded logs hide attack patterns