HTML Character Set

To display HTML pages correctly, browsers must know the character set (encoding) to be used:

Example

<meta charset="UTF-8">

HTML Character Set

The HTML5 specification encourages web developers to use the UTF-8 character set!

However, this was not always the case. The character encoding of the early Web was ASCII.

Later, from HTML 2.0 to HTML 4.01, ISO-8859-1 was considered the standard character set.

With XML and HTML5, UTF-8 finally appeared and solved many character encoding issues.

Initially: ASCII

Computer data is stored in electronic devices as binary codes (01000101).

To standardize the storage of text, the American Standard Code for Information Interchange (ASCII) was created. It defines a unique binary number for each storable character to support numbers 0-9, uppercase and lowercase letters (a-z, A-Z), and special characters (such as ! $ + - ( ) @ < > ,).

Since ASCII uses 7-bit characters, it can only represent 128 different characters.

The biggest drawback of ASCII is that it excludes non-English letters.

Today, ASCII is still in use, especially in large mainframe computer systems.

For more in-depth research, please visit our Complete ASCII reference.

In Windows: Windows-1252

Windows-1252 is the default character set in Windows (up to Windows 95).

It is an extension of ASCII, adding international characters.

It uses a full byte (8 bits) to represent 256 different characters.

Since Windows-1252 is the default setting in Windows, all browsers support it.

For more in-depth research, please visit our Complete Windows-1252 reference.

In HTML 4: ISO-8859-1

ISO-8859-1 is the most commonly used character set in HTML 4.

ISO-8859-1 is an extension of ASCII, adding international characters.

Example

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

In HTML 4, you can specify a character set other than ISO-8859-1 in the <meta> tag:

Example

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-8">

All HTML 4 processors also support UTF-8:

Example

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

Tip:When the browser detects ISO-8859-1, it usually defaults to Windows-1252 because Windows-1252 has an additional 32 international characters.

For more in-depth research, please visit our Complete ISO-8859-1 Reference.

In HTML5: Unicode UTF-8

The HTML5 specification encourages web developers to use the UTF-8 character set.

Example

<meta charset="UTF-8">

You can specify a character set other than UTF-8 in the <meta> tag:

Example

<meta charset="ISO-8859-1">

The Unicode Consortium developed UTF-8 and UTF-16 standards because the ISO-8859 character set is limited and not compatible with multilingual environments.

The Unicode standard (almost) covers all characters, punctuation, and symbols in the world.

Tip:All HTML5 and XML processors support UTF-8, UTF-16, Windows-1252, and ISO-8859.

For more in-depth research, please visit our Complete Unicode Reference.