Character variants in Unicode

In the Unicode, there are several code points for fullwidth characters. Here’s a comparison between the normal ASCII characters and their fullwidth counterparts (the normal is written first):


The superscript characters like ² is also a display variant of normal characters like 2.

Another amusing thing is the existence of language-specific characters. An example is the Greek capital letter eta (Η, U+0397) and the Cyrillic capital letter en (Н, U+041D). In my machine, they look exactly like the Latin capital letter H (which is ASCII 72 or U+0048).

I actually have a mixed feeling about including display variants in a character set. In light of HTML and various text-formatting utilities (TeX, office suites), display variants can be regarded as a waste of code points. For example, in HTML subscripts can be achieved using the tag <sup> and specific fonts (for example fullwidth) can be chosen using CSS (or the old-style <font> tag). About language variants, again HTML renders this unnecessary because there is the “lang” (or “xml:lang”) attribute.

However, variants have some merits. One use of those variants is of course for plain text files. For example, with the character “²” I can write “a² + b² = c²” nicely in a plain text file. The other benefit is space efficiency. For example, “²” is one character, while “<sup>2</sup>” consists of a lot.

What I hate about language variants is that it conflicts with one major theme in the Unicode work: CJK (Chinese Japanese Korean) character unification. In the Unicode, there is no such thing as the Japanese 人, Chinese 人, and Korean 人. There is only one character for all three languages: 人. This is in spite of drawing differences between some of the characters! Thus, it is not possible to convey the difference in a plain text file.

For example, here is the CJK character for “now” but displayed differently (if your computer is set up correctly) because of the “lang” attribute: (Japanese) vs. (Chinese). Both are U+4ECA. In my computer it looks like this:

Japanese vs. Chinese 今

See the HTML source code for more info.

