Q: What are the most common reserved characters in HTML?

The five most common reserved characters that must be encoded in HTML are: the less-than sign ( ) encoded as '>', the ampersand (&) encoded as '&', the double quote (") encoded as '"', and the single quote (') encoded as '''. Encoding these five characters is the foundation of preventing Cross-Site Scripting (XSS) vulnerabilities when inserting user input into an HTML document.

Question 1

What are HTML entities and why are they needed?

Accepted Answer

HTML entities are strings that begin with an ampersand (&) and end with a semicolon (;), used to represent characters in HTML that have special meaning or are otherwise invisible or difficult to type. The primary reason they are needed is to prevent browsers from misinterpreting text as HTML code. For example, if you want to display the less-than sign (<) in a paragraph, the browser might think you are starting an HTML tag. By using the entity '<', the browser knows to safely display the symbol rather than interpret it as code.

Question 2

What are the most common reserved characters in HTML?

Accepted Answer

The five most common reserved characters that must be encoded in HTML are: the less-than sign (<) encoded as '<', the greater-than sign (>) encoded as '>', the ampersand (&) encoded as '&', the double quote (") encoded as '"', and the single quote (') encoded as '''. Encoding these five characters is the foundation of preventing Cross-Site Scripting (XSS) vulnerabilities when inserting user input into an HTML document.

Question 3

What is the difference between named, decimal, and hex numeric entities?

Accepted Answer

There are three ways to represent the same character using HTML entities. A 'named entity' uses a human-readable abbreviation, such as '&copy;' for the copyright symbol (©). A 'decimal numeric entity' uses the character's decimal position in the Unicode standard, preceded by '&#', such as '&#169;'. A 'hexadecimal numeric entity' uses the hex position, preceded by '&#x', such as '&#xA9;'. Named entities are easier to read in source code, but numeric entities are universally supported by all browsers and cover every possible Unicode character, even those without a named alias.

Question 4

Should I encode all non-ASCII characters on a modern webpage?

Accepted Answer

Generally, no, as long as your document is served with UTF-8 encoding. Modern web browsers handle UTF-8 perfectly, meaning you can directly copy, paste, and type characters ending with foreign letters, emojis, and symbols straight into your HTML without encoding them. However, if you are forced to use an older character encoding like ISO-8859-1, or if you need to ensure the HTML source is purely ASCII for legacy systems, then encoding non-ASCII characters into entities becomes necessary to prevent them from displaying as garbled text (mojibake).

HTML Entity Encoder / Decoder

HTML Entities Explained

Named vs. Numeric

UTF-8 and Non-ASCII Characters