UTF-32 | UTF-16 | UTF-8 | ||||
---|---|---|---|---|---|---|
first | second | first | second | third | fourth | |
Definitions | ||||||
00000000000000xxxxxxx | 000000000xxxxxxx | 0xxxxxxx | ||||
0000000000yyyyxxxxxxx | 00000yyyyxxxxxxx | 110yyyyx | 10xxxxxx | |||
00000zzzzzyyyyxxxxxxx | zzzzzyyyyxxxxxxx | 1110zzzz | 10zyyyyx | 10xxxxxx | ||
uuuuuzzzzzyyyyxxxxxxx | 110110wwwwzzzzzy | 110111yyyxxxxxxx | 11110uuu | 10uuzzzz | 10zyyyyx | 10xxxxxx |
wwww = uuuuu-1 | ||||||
Valid ranges | ||||||
0x00000000-0x0000007F | 0x0000-0x007F | 0x00-0x7F | ||||
0x00000080-0x000007FF | 0x0080-0x07FF | 0xC2-0xDF | 0x80-0xBF | |||
0x00000800-0x00000FFF | 0x0800-0x0FFF | 0xE0-0xE0 | 0xA0-0xBF | 0x80-0xBF | ||
0x00001000-0x0000CFFF | 0x1000-0xCFFF | 0xE1-0xEC | 0x80-0xBF | 0x80-0xBF | ||
0x0000D000-0x0000D7FF | 0xD000-0xD7FF | 0xED-0xED | 0x80-0x9F | 0x80-0xBF | ||
0x0000D800-0x0000DFFF | invalid | |||||
0x0000E000-0x0000FFFF | 0xE000-0xFFFF | 0xEE-0xEF | 0x80-0xBF | 0x80-0xBF | ||
0x00010000-0x0003FFFF | 0xD800-0xD8BF | 0xDC00-0xDFFF | 0xF0-0xF0 | 0x90-0xBF | 0x80-0xBF | 0x80-0xBF |
0x00040000-0x000FFFFF | 0xD8C0-0xDBBF | 0xDC00-0xDFFF | 0xF1-0xF3 | 0x80-0xBF | 0x80-0xBF | 0x80-0xBF |
0x00100000-0x0010FFFF | 0xDBC0-0xDBFF | 0xDC00-0xDFFF | 0xF4-0xF4 | 0x80-0x8F | 0x80-0xBF | 0x80-0xBF |
0x00110000-0xFFFFFFFF | invalid | |||||
Byte Order Mark (BOM) - Optional first code in a file | ||||||
0x0000FEFF | 0xFEFF | 0xEF | 0xBB | 0xBF |