Character encodings

From TheAlmightyGuru
Jump to: navigation, search

ASCII

Char Name Hex Dec Oct Binary Percent Caret Escape
Null [NUL] 00 000 000 00000000  %00 ^@ \0
Start of heading [SOH] 01 001 001 00000001  %01 ^A
Start of text [STX] 02 002 002 00000010  %02 ^B
End of text [ETX] 03 003 003 00000011  %03 ^C
End of transmission [EOT] 04 004 004 00000100  %04 ^D
Enquiry [ENQ] 05 005 005 00000101  %05 ^E
Acknowledge [ACK] 06 006 006 00000110  %06 ^F
Bell [BEL] 07 007 007 00000111  %07 ^G \a
Backspace [BS] 08 008 010 00001000  %08 ^H \n
Horizontal tabulation [HT] 09 009 011 00001001  %09 ^I \t
Line feed [LF] 0A 010 012 00001010  %0A ^J \n
Vertical tabulation [VT] 0B 011 013 00001011  %0B ^K \v
Form feed [FF] 0C 012 014 00001100  %0C ^L \f
Carriage return [CR] 0D 013 015 00001101  %0D ^M \r
Shift out [SS] 0E 014 016 00001110  %0E ^N
Shift in [SI] 0F 015 017 00001111  %0F ^O
Data link escape [DLE] 10 016 020 00010000  %10 ^P
Device control 1 [DC1] 11 017 021 00010001  %11 ^Q
Device control 2 [DC2] 12 018 022 00010010  %12 ^R
Device control 3 [DC3] 13 019 023 00010011  %13 ^S
Device control 4 [DC4] 14 020 024 00010100  %14 ^T
Negative acknowledge [NAK] 15 021 025 00010101  %15 ^U
Synchronous idle [SYN] 16 022 026 00010110  %16 ^V
End of transmission block [ETB] 17 023 027 00010111  %17 ^W
Cancel [CAN] 18 024 030 00011000  %18 ^X
End of medium [EM] 19 025 031 00011001  %19 ^Y
Substitute [SUB] 1A 026 032 00011010  %1A ^Z
Escape [ESC] 1B 027 033 00011011  %1B ^[ \e
File separator [FS] 1C 028 034 00011100  %1C ^\
Group separator [GS] 1D 029 035 00011101  %1D ^]
Record separator [RS] 1E 030 036 00011110  %1E ^^
Unit separator [US] 1F 031 037 00011111  %1F ^_
Space 20 032 040 00100000  %20
 ! Exclamation point 21 033 041 00100001  %21
" Quotation mark 22 034 042 00100010  %22 \"
# Number sign 23 035 043 00100011  %23
$ Dollar sign 24 036 044 00100100  %24
 % Percent sign 25 037 045 00100101  %25
& Ampersand 26 038 046 00100110  %26
' Apostrophe 27 039 047 00100111  %27 \'
( Left parenthesis 28 040 050 00101000  %28
) Right parenthesis 29 041 051 00101001  %29
* Asterisk 2A 042 052 00101010  %2A
+ Plus sign 2B 043 053 00101011  %2B
, Comma 2C 044 054 00101100  %2C
- Hyphen-minus 2D 045 055 00101101  %2D
. Period 2E 046 056 00101110  %2E
/ Slash 2F 047 057 00101111  %2F
0 Digit zero 30 048 060 00110000
1 Digit one 31 049 061 00110001
2 Digit two 32 050 062 00110010
3 Digit three 33 051 063 00110011
4 Digit four 34 052 064 00110100
5 Digit five 35 053 065 00110101
6 Digit six 36 054 066 00110110
7 Digit seven 37 055 067 00110111
8 Digit eight 38 056 070 00111000
9 Digit nine 39 057 071 00111001
 : Colon 3A 058 072 00111010  %3A
 ; Semicolon 3B 059 073 00111011  %3B
< Less-than sign 3C 060 074 00111100  %3C
= Equals sign 3D 061 075 00111101  %3D
> Greater-than sign 3E 062 076 00111110  %3E
 ? Question mark 3F 063 077 00111111  %3F \?
@ Commercial at 40 064 100 01000000  %40
A Latin capital letter A 41 065 1081 01000001
B Latin capital letter B 42 066 102 01000010
C Latin capital letter C 43 067 103 01000011
D Latin capital letter D 44 068 104 01000100
E Latin capital letter E 45 069 105 01000101
F Latin capital letter F 46 070 106 01000110
G Latin capital letter G 47 071 107 01000111
H Latin capital letter H 48 072 110 01001000
I Latin capital letter I 49 073 111 01001001
J Latin capital letter J 4A 074 112 01001010
K Latin capital letter K 4B 075 113 01001011
L Latin capital letter L 4C 076 114 01001100
M Latin capital letter M 4D 077 115 01001101
N Latin capital letter N 4E 078 116 01001110
O Latin capital letter O 4F 079 117 01001111
P Latin capital letter P 50 080 120 01010000
Q Latin capital letter Q 51 081 121 01010001
R Latin capital letter R 52 082 122 01010010
S Latin capital letter S 53 083 123 01010011
T Latin capital letter T 54 084 124 01010100
U Latin capital letter U 55 085 125 01010101
V Latin capital letter V 56 086 126 01010110
W Latin capital letter W 57 087 127 01010111
X Latin capital letter X 58 088 130 01011000
Y Latin capital letter Y 59 089 131 01011001
Z Latin capital letter Z 5A 090 132 01011010
[ Left square bracket 5B 091 133 01011011  %5B
\ Backslash 5C 092 134 01011100  %5C \\
] Right square bracket 5D 093 135 01011101  %5D
^ Circumflex accent 5E 094 136 01011110  %5E
_ Underscore 5F 095 137 01011111  %5F
` Grave accent 60 096 140 01100000  %60
a Latin small letter a 61 097 141 01100001
b Latin small letter b 62 098 142 01100010
c Latin small letter c 63 099 143 01100011
d Latin small letter d 64 100 144 01100100
e Latin small letter e 65 101 145 01100101
f Latin small letter f 66 102 146 01100110
g Latin small letter g 67 103 147 01100111
h Latin small letter h 68 104 150 01101000
i Latin small letter i 69 105 151 01101001
j Latin small letter j 6A 106 152 01101010
k Latin small letter k 6B 107 153 01101011
l Latin small letter l 6C 108 154 01101100
m Latin small letter m 6D 109 155 01101101
n Latin small letter n 6E 110 156 01101110
o Latin small letter o 6F 111 157 01101111
p Latin small letter p 70 112 160 01110000
q Latin small letter q 71 113 161 01110001
r Latin small letter r 72 114 162 01110010
s Latin small letter s 73 115 163 01110011
t Latin small letter t 74 116 164 01110100
u Latin small letter u 75 117 165 01110101
v Latin small letter v 76 118 166 01110110
w Latin small letter w 77 119 167 01110111
x Latin small letter x 78 120 170 01111000
y Latin small letter y 79 121 171 01111001
z Latin small letter z 7A 122 172 01111010
{ Left curly brace 7B 123 173 01111011  %7B
| Vertical line 7C 124 174 01111100  %7C
} Right curly brace 7D 125 175 01111101  %7D
~ Tilde 7E 126 176 01111110  %7E
Delete [DEL] 7F 127 177 01111111  %7F ^?

Percent encoding

Used to encode reserved characters in URIs and URLs. Those with a blue background are reserved and must be encoded, those with a red background are not technically allowable in URIs, but can be included when encoded, those with a white background may be encoded optionally, but do not have to be.

For ASCII characters, the encoding is simply the value in hex prefaced by a percent sign. For Unicode characters, it's a more complicated system of multiple encoded values.

Any reserved symbols in the file name of a files downloaded from a web site may be encoded with these.

Caret notation

Used in older systems both as a way to display the non-printable character values and enter the characters as input since few have a keyboard equivalent. The most common way to enter these values is to hold the Control key on a keyboard, then press the corresponding character. For example, Control+H will have the same effect as pressing the backspace key in many terminals.

Escape notion

Used in C-like languages to "escape" normal interpretation. This allows a programmer to tell the compiler not to interpret the following at face value, but instead convert it to the corresponding ASCII value.