URL Encoding Unicode, Emoji, and International Characters
- Non-ASCII characters (accented letters, emoji, CJK characters) are first converted to UTF-8 bytes, then each byte is percent-encoded.
- A single emoji can expand to a dozen percent-encoded characters because emoji require multiple UTF-8 bytes.
- All major encoding functions handle this automatically — just pass the string and the output is correctly encoded.
Table of Contents
ASCII characters (letters, digits, and basic punctuation) have a straightforward percent-encoding: one character becomes one %XX code. Non-ASCII characters — accented letters like é, CJK ideographs like 日, Arabic script, emoji — need more work. They're first converted to UTF-8 byte sequences, then every byte is percent-encoded individually.
The result looks long and opaque, but it's completely standard and all modern servers decode it correctly.
How Non-ASCII Characters Are Encoded
The process:
- Take the character (e.g.,
é, U+00E9) - Convert it to its UTF-8 byte sequence:
é→0xC3 0xA9(two bytes) - Percent-encode each byte:
%C3%A9
More examples:
| Character | UTF-8 Bytes | Encoded |
|---|---|---|
é | C3 A9 | %C3%A9 |
ü | C3 BC | %C3%BC |
日 | E6 97 A5 | %E6%97%A5 |
中 | E4 B8 AD | %E4%B8%AD |
😀 | F0 9F 98 80 | %F0%9F%98%80 |
→ | E2 86 92 | %E2%86%92 |
Encoding Non-ASCII Characters in Code
All major language encoding functions handle Unicode automatically when you pass a string:
// JavaScript
encodeURIComponent('café') // 'caf%C3%A9'
encodeURIComponent('日本語') // '%E6%97%A5%E6%9C%AC%E8%AA%9E'
encodeURIComponent('😀') // '%F0%9F%98%80'
# Python
from urllib.parse import quote
quote('café') # 'caf%C3%A9'
quote('日本語') # '%E6%97%A5%E6%9C%AC%E8%AA%9E'
The encoding functions take care of the UTF-8 conversion step — you don't need to do it manually. Just pass the Unicode string.
Sell Custom Apparel — We Handle Printing & Free ShippingA Note on Internationalized Domain Names (IDN)
Domain names have their own encoding system for non-ASCII characters: Punycode. A domain like münchen.de becomes xn--mnchen-3ya.de in Punycode. This is handled by your browser and DNS resolver automatically — you don't percent-encode domain names.
Percent-encoding applies to the path, query string, and fragment parts of a URL — not the scheme or domain. A URL like https://münchen.de/search?q=café in practice gets both Punycode encoding on the domain and percent-encoding on the query value.
Using Emoji in URLs
Emoji in URLs expand significantly because they require 4 UTF-8 bytes each, producing 12 characters of percent-encoded output per emoji. A URL like /search?q=🍕+recipes becomes /search?q=%F0%9F%8D%95+recipes.
This is valid and correct — servers decode them back to the original emoji. The encoded form is what's actually transmitted and stored in server logs. The readable form with emoji is a browser display convenience.
Use the Mongoose URL Encoder to encode or decode emoji and international text instantly — paste the character and see the percent-encoded form.
Encode Any Character — Including Emoji
Paste any Unicode text, emoji, or international characters into the Mongoose URL Encoder and see the percent-encoded result instantly.
Open URL EncoderFrequently Asked Questions
Does URL encoding work for Arabic and Hebrew (right-to-left text)?
Yes. URL encoding works on bytes, not on the visual representation. Arabic and Hebrew characters are converted to their UTF-8 byte sequences and then percent-encoded, just like any other non-ASCII text. The direction of the text doesn't affect the encoding.
What if I need to include a character that doesn't have a UTF-8 encoding?
Every Unicode code point has a UTF-8 encoding — UTF-8 covers all 1.1 million+ Unicode code points. There is no Unicode character that can't be percent-encoded via UTF-8.
Why do some encoded URLs use lowercase hex (%c3%a9) and some use uppercase (%C3%A9)?
Both are valid. RFC 3986 recommends uppercase hex digits, but lowercase is widely accepted. When comparing or normalizing URLs, treat uppercase and lowercase hex as equivalent.
Can browsers display the original Unicode characters in the address bar even though they're encoded?
Yes. Modern browsers decode and display most non-ASCII characters in the address bar for readability. The underlying request still uses the percent-encoded form. Copy the URL from the address bar and paste it somewhere else to see the encoded version.

