HTML Character Set
- Previous Page HTML Character Set
- Next Page HTML ASCII
To display HTML pages correctly, browsers must know the character set (encoding) to be used:
Example
<meta charset="UTF-8">
HTML Character Set
The HTML5 specification encourages web developers to use the UTF-8 character set!
However, this was not always the case. The character encoding of the early Web was ASCII.
Later, from HTML 2.0 to HTML 4.01, ISO-8859-1 was considered the standard character set.
With XML and HTML5, UTF-8 finally appeared and solved many character encoding issues.
Initially: ASCII
Computer data is stored in electronic devices as binary codes (01000101).
To standardize the storage of text, the American Standard Code for Information Interchange (ASCII) was created. It defines a unique binary number for each storable character to support numbers 0-9, uppercase and lowercase letters (a-z, A-Z), and special characters (such as ! $ + - ( ) @ < > ,).
Since ASCII uses 7-bit characters, it can only represent 128 different characters.
The biggest drawback of ASCII is that it excludes non-English letters.
Today, ASCII is still in use, especially in large mainframe computer systems.
For more in-depth research, please visit our Complete ASCII reference.
In Windows: Windows-1252
Windows-1252 is the default character set in Windows (up to Windows 95).
It is an extension of ASCII, adding international characters.
It uses a full byte (8 bits) to represent 256 different characters.
Since Windows-1252 is the default setting in Windows, all browsers support it.
For more in-depth research, please visit our Complete Windows-1252 reference.
In HTML 4: ISO-8859-1
ISO-8859-1 is the most commonly used character set in HTML 4.
ISO-8859-1 is an extension of ASCII, adding international characters.
Example
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
In HTML 4, you can specify a character set other than ISO-8859-1 in the <meta> tag:
Example
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-8">
All HTML 4 processors also support UTF-8:
Example
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
Tip:When the browser detects ISO-8859-1, it usually defaults to Windows-1252 because Windows-1252 has an additional 32 international characters.
For more in-depth research, please visit our Complete ISO-8859-1 Reference.
In HTML5: Unicode UTF-8
The HTML5 specification encourages web developers to use the UTF-8 character set.
Example
<meta charset="UTF-8">
You can specify a character set other than UTF-8 in the <meta> tag:
Example
<meta charset="ISO-8859-1">
The Unicode Consortium developed UTF-8 and UTF-16 standards because the ISO-8859 character set is limited and not compatible with multilingual environments.
The Unicode standard (almost) covers all characters, punctuation, and symbols in the world.
Tip:All HTML5 and XML processors support UTF-8, UTF-16, Windows-1252, and ISO-8859.
For more in-depth research, please visit our Complete Unicode Reference.
- Previous Page HTML Character Set
- Next Page HTML ASCII