Kanji as a form of data compression

Using kanji, many ideas can be expressed using just a few characters. For example, here’s how we write the 12 months in various ways:

Kanji Hiragana Roomaji English Indonesian
一月 いちがつ ichigatsu January Januari
二月 にがつ nigatsu February Februari
三月 さんがつ sangatsu March Maret
四月 しがつ shigatsu April April
五月 ごがつ gogatsu May Mei
六月 ろくがつ rokugatsu June Juni
七月 しちがつ shichigatsu July Juli
八月 はちがつ hachigatsu August Agustus
九月 くがつ kugatsu September September
十月 じゅうがつ juugatsu October Oktober
十一月 じゅういちがつ juuichigatsu November November
十二月 じゅうにがつ juunigatsu December Desember
Average
character
2.17 4.17 8.83 6.17 6.25

Note that the average character count drops from roomaji to hiragana. That is expected, since each hiragana symbol expresses the idea of mora which for this discussion can be regarded as a syllable. If we use roomaji, most syllables must be written using two or more characters. Therefore hiragana can be thought to compress roomaji. As a character, hiragana is more high level than roomaji.

The average character count drops again when we go from hiragana to kanji. Kanji is even more high level than hiragana. Each kanji expresses a certain idea. Because most kanji expands to more than one character when written using hiragana, kanji can be thought to compress hiragana.

I’ve heard people say, “kanji is sooo ancient. They should abolish it and replace it with something simpler and modern like the latin alphabet.” It eventually boils down to the unwillingness to memorize lots of high level symbols.

However, kanji is a form of pictogram. What they don’t realize is they also use some pictograms. Ever saw 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0? Great, let’s abolish them. Then we can all have fun writing “sixty five thousand five hundred thirty six” or “enam puluh lima ribu lima ratus tiga puluh enam”.

Anyway, it is natural to ask, “can we define even more higher level elements?”. I don’t see that happening in natural language, but there is one language in which simpler concepts (encoded in symbols) are used to consecutively build more complex ones: mathematics.

In modern mathematics, everything starts with the set theory. There we see symbols like “{“, “}”, “,”, and “⊆”. From sets, we can define things such as the natural number, and naturally (no pun intended) new symbols like “1” and “0” appear.

Going even higher level, there is calculus in which symbols like “∫” appears. Calculus is very high level so that using vector calculus, all electromagnetic phenomena can be written in only four equations (the so-called “Maxwell’s Equations“).

I think it is astonishing that using the more high-level symbols in Clifford Algebra, the Maxwell’s Equations can be written in only one equation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: