Posts Tagged ‘Kanji’

Configuring the correct Japanese fonts for Windows GTK applications

2008 April 13

On a previous blog, I discussed how win32 GTK/GTK+ programs are smart enough to choose a Japanese translation by default if your system’s language is set to Japanese. However, there’s one big shame that I concealed: it will not choose the fonts correctly.

Related to this problem is how the Unicode standard handles Japanese and Chinese characters. You see, the characters knows as kanji, used in Japan, historically comes from China. In fact, kanji literally means Han characters. But that happened more than a thousand years ago. Time always brings change, and now many characters are drawn differently in each countries.

On the image below, you can see how some Japanese characters (black) differs from the Chinese counterpart (blue):

Difference between Japanese and Chinese kanji glyphs

You can see that even the stroke count can differ!

Unicode, in its effort called Han Unification, insisted that Japanese, traditional Chinese, and Korean characters which historically were same must only get a codepoint. So there can’t be one Unicode character for the Japanese version of ‘close’ and another for the Chinese version. Any differences then must be achieved by fonts. So yes, in the screenshot above, the Japanese and Chinese characters are actually the same Unicode character, but rendered in OpenOffice.org with different fonts. And yes, that means you can’t display both Chinese and Japanese text in a simple text document (which can only use one font for the whole file), unless you happen to use only the characters which are country invariant.

Now, back to GTK. GTK programs use a configuration file called pango.aliases to select its fonts. Here’s a sample line:

sans = "arial,browallia new,mingliu,simhei,gulimche,ms gothic"

Now that line means that, if a character must be drawn on screen as a Sans-serif character (“sans”), then try to display it using the “arial” font which is first in the list. If the character isn’t on the system’s Arial font, then try “browallia new”. If it fails, try the next one, “mingliu”. And so on.

Problem comes when a static list like that meets the intricacies of Unicode’s Han unification. For probably a random reason, the configuration file of Windows GTK programs put Chinese fonts (mingliu etc.) before Japanese fonts (ms gothic etc.). So there you have it, a user interface of Japanese translation displayed using “Chinese” characters:

Inkscape using Japanese translation but Chinese characters!

If you’re like me, then that extra dot stroke on “chikai” will really get on your nerve.

The solution is a simple exercise of find and replace. Now find all files named pango.aliases on your hard drive, which most probably will be inside your Program Files folder. Each installed GTK program can have one, but they can also use the “shared” GTK’s. If you already know where your GTK programs are, the file is actually located in the etc\pango subfolder. Once found, replace the content with my hand-crafted version:

courier = "courier new,MS Mincho" 

tahoma = "tahoma,MS PGothic,browallia new,mingliu,simhei,gulimche,ms gothic,kartika,latha,mangal"
sans = "arial,MS PGothic,browallia new,mingliu,simhei,gulimche,ms gothic,kartika,latha,mangal"
serif = "times new roman,MS PMincho,angsana new,mingliu,simsun,gulimche,ms gothic,kartika,latha,mangal"
mono = "courier new,MS Mincho,courier monothai,mingliu,simsun,gulimche,ms gothic,kartika,latha,mangal"
monospace = "courier new,MS Mincho,courier monothai,mingliu,simsun,gulimche,ms gothic,kartika,latha,mangal"

Now your configuration will prefer Japanese fonts rather than Chinese ones. Talk about font discrimination! Here’s the result:

Inkscape using Japanese translation and the correct fonts

Ah, Japanese translation in Japanese fonts. No more wrong fonts. That feels better.

2,500 kanji and counting :)

2008 April 9

This post has been moved to singularity.agronesia.net: “2,500 kanji and counting :)”. Please visit the new server.

ni-sen ijou

2008 January 21

Excuse my laziness of blogging… You see, I’m now in this remote place called Sokaraja and circumstances force me to go to the town Purwokerto to surf the net. That’s quite far for my standard and so… Well enough excuses.

This will be just another monotone dump, but believe me the study isn’t as boring as this post looks. I’ve dumped 92 new kanji and 131 new words, for a total of 2,309 kanji and 10,354 words. Believe me, even with this amount of kanji I’m still humbled by the amount of new characters I found every day. Just keep moving on and know no surrender.

To spice things up a bit, I’ll tell you my current Japanese diet. I’m still trying to finish that WW2 article on Wikipedia. It goes roughly two paragraphs a day, so probably hell will freeze faster. I’m also playing freeciv, an open source game which has a Japanese translation! Not so much playing, but exploring all the text inside and trying to read it. If you’re interested in trying it but has problems, just mail me (for me I can’t just run it and get a usable learning environment, but I’m not writing about it now). Like explained on another post, I’m also still going through “Japanese: A Comprehensive Grammar”. All those and randomly leafing through Japanese books I have/borrowed.

Ah, I almost forget… I also now regularly listen to podcasts downloaded from japanesepod101.com. Be sure to visit that site!

So here are the kanji:

肥醸陶婆浸艇殻疫謀喝騰迅肢燥紳捜侯赴薫該貞偵晶拷謹刃彰銃痴斎附帥稼簿弊絞宥邑昌旭禎嘉慧栗堆晒曾傀儡爺塹壕揆簒恫剃蟹宋楷艸已筈馳飴瘡汲釧喉瞿矍攫侭謂唖尖曰籠夭訃凛繚峙骸崖袖嘗袴溺牽溥奢綻

And the words:

(more…)

Dump: 2200 kanji and counting

2008 January 5

A regular run of the mill dump post. So yeah, I still read Japanese materials routinely to find new words and especially kanji, and right now my main sources are the WW2 article on Wikipedia which is still a long way to finish and starting to get extremely boring and tiresome (勃発、勃発、侵攻、侵攻), an encyclopedic Japanese grammar book “Japanese: A Comprehensive Grammar” from “Routlege Grammars” which I like very much because it contains translations and for every example which is written in genuine Japanese characters, and some other reading sources like the various Japanese magazines and books I have on my disposal which I open randomly and by whim (see screenshot above for an example). Oh, and if you think the previous sentence is too long, blame me for reading too much written Japanese in which sentences are unreasonable long which is apparently just for the author’s pleasure to torment foreign readers which are not accustomed for such lengthy parsing using their untrained brain which is actually a very capable biological computer.

For this dump, there are 100 new kanji and 178 new word. Now my kanji count is 2,217 and my word count is 10,223. It might be interesting to know that among those 2200 or so kanji, I still haven’t encountered six grade 5 kanji and three grade 6 kanji! So there you have it for the commonness of Jouyou kanji.

Note about my method of memorizing these words and kanji. When I encounter new words, I searched for it in an electronic dictionary and then put it on my spreadsheet file of Japanese words (and kanji). I just collect it there as much as I find. Then, I separately put the words there to Mnemosyne, first come first serve. These two are not synchronized, so I don’t have to directly put all words I find to Mnemosyne. In fact, I have almost 3,000 words that I’ve put on my word list waiting to be put into Mnemosyne.

Anyway, here are the kanji:

穀律尺酔怠賓克債墨戒併隷循誇呈排斥薪漂錯枠弧賠窒掌覇津襟某斉撲罰封搭溝啓妄祥洲伽麿蘭玖伍綜渚晋叡哉眸鯉緋鳩冴卆蒋并餅孛勃葛盡儘夸牒狼厭猒區謳賭阡萬肆捌陌戊庚癸苺膣腟氐咸股踪幟摺柿匪榧刳肛菐斬荅亢杭釘惧

And the words:

(more…)