Archive for the ‘Thoughts’ Category

Multilanguage support in Windows programs

2008 April 6

This post has been moved to singularity.agronesia.net: “Multilanguage support in Windows programs”. Please visit the new server.

Laborous questions in a test

2007 March 13

Why must instructors give a very “long” problem which doesn’t test understanding any better than a “shorter” problem?

Here’s an example problem to test the understanding of shift cipher:

Encrypt the plaintext “example” using the shift cipher with key B.

That problem should suffice. However here’s what some instructors like to give:

Encrypt the plaintext “iliketoseemystudentssufferhahahaiamevil” using the shift cipher with key P.

The second problem isn’t intellectually harder, it’s just more laborous!

I can forsee a similar agony in a microbiology test:

The nucleotide sequence of one DNA strand of a DNA double helix is:
-GGAGATCGCATGCATGCACAGCTGACGATGCA-
(dunno whether it is realistic, I just typed the ATGCs randomly)
What is the sequence of the complementary strand?

Isn’t a strand of -ATGC- enough?

PS: Oh and about that second example, it’s actually quite nice considering that my instructor gave a LONGER ciphertext to encrypt… Unbelievable…

Kanji as a form of data compression

2006 September 24

Using kanji, many ideas can be expressed using just a few characters. For example, here’s how we write the 12 months in various ways:

Kanji Hiragana Roomaji English Indonesian
一月 いちがつ ichigatsu January Januari
二月 にがつ nigatsu February Februari
三月 さんがつ sangatsu March Maret
四月 しがつ shigatsu April April
五月 ごがつ gogatsu May Mei
六月 ろくがつ rokugatsu June Juni
七月 しちがつ shichigatsu July Juli
八月 はちがつ hachigatsu August Agustus
九月 くがつ kugatsu September September
十月 じゅうがつ juugatsu October Oktober
十一月 じゅういちがつ juuichigatsu November November
十二月 じゅうにがつ juunigatsu December Desember
Average
character
2.17 4.17 8.83 6.17 6.25

Note that the average character count drops from roomaji to hiragana. That is expected, since each hiragana symbol expresses the idea of mora which for this discussion can be regarded as a syllable. If we use roomaji, most syllables must be written using two or more characters. Therefore hiragana can be thought to compress roomaji. As a character, hiragana is more high level than roomaji.

The average character count drops again when we go from hiragana to kanji. Kanji is even more high level than hiragana. Each kanji expresses a certain idea. Because most kanji expands to more than one character when written using hiragana, kanji can be thought to compress hiragana.

I’ve heard people say, “kanji is sooo ancient. They should abolish it and replace it with something simpler and modern like the latin alphabet.” It eventually boils down to the unwillingness to memorize lots of high level symbols.

However, kanji is a form of pictogram. What they don’t realize is they also use some pictograms. Ever saw 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0? Great, let’s abolish them. Then we can all have fun writing “sixty five thousand five hundred thirty six” or “enam puluh lima ribu lima ratus tiga puluh enam”.

Anyway, it is natural to ask, “can we define even more higher level elements?”. I don’t see that happening in natural language, but there is one language in which simpler concepts (encoded in symbols) are used to consecutively build more complex ones: mathematics.

In modern mathematics, everything starts with the set theory. There we see symbols like “{“, “}”, “,”, and “⊆”. From sets, we can define things such as the natural number, and naturally (no pun intended) new symbols like “1” and “0” appear.

Going even higher level, there is calculus in which symbols like “∫” appears. Calculus is very high level so that using vector calculus, all electromagnetic phenomena can be written in only four equations (the so-called “Maxwell’s Equations“).

I think it is astonishing that using the more high-level symbols in Clifford Algebra, the Maxwell’s Equations can be written in only one equation.

Character variants in Unicode

2006 September 19

In the Unicode, there are several code points for fullwidth characters. Here’s a comparison between the normal ASCII characters and their fullwidth counterparts (the normal is written first):

AABBCCDDEEFF

The superscript characters like ² is also a display variant of normal characters like 2.

Another amusing thing is the existence of language-specific characters. An example is the Greek capital letter eta (Η, U+0397) and the Cyrillic capital letter en (Н, U+041D). In my machine, they look exactly like the Latin capital letter H (which is ASCII 72 or U+0048).

I actually have a mixed feeling about including display variants in a character set. In light of HTML and various text-formatting utilities (TeX, office suites), display variants can be regarded as a waste of code points. For example, in HTML subscripts can be achieved using the tag <sup> and specific fonts (for example fullwidth) can be chosen using CSS (or the old-style <font> tag). About language variants, again HTML renders this unnecessary because there is the “lang” (or “xml:lang”) attribute.

However, variants have some merits. One use of those variants is of course for plain text files. For example, with the character “²” I can write “a² + b² = c²” nicely in a plain text file. The other benefit is space efficiency. For example, “²” is one character, while “<sup>2</sup>” consists of a lot.

What I hate about language variants is that it conflicts with one major theme in the Unicode work: CJK (Chinese Japanese Korean) character unification. In the Unicode, there is no such thing as the Japanese 人, Chinese 人, and Korean 人. There is only one character for all three languages: 人. This is in spite of drawing differences between some of the characters! Thus, it is not possible to convey the difference in a plain text file.

For example, here is the CJK character for “now” but displayed differently (if your computer is set up correctly) because of the “lang” attribute: (Japanese) vs. (Chinese). Both are U+4ECA. In my computer it looks like this:

Japanese vs. Chinese 今

See the HTML source code for more info.

Recalling items in an ordered list etc

2006 August 19

Recalling items in an ordered list – 5:22 PM 8/18/2006

Consider an ordered list like days (Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday) and the first 10 natural numbers (1, 2, 3, 4, 5, 6, 7, 8, 9, 10). It is interesting that the ability to quickly enumerate the items forward doesn’t translate to the ability to do it backwards.

For example, I can say ‘a’ to ‘z’ very quickly, but I can’t say ‘z’ to ‘a’ quickly. Here are some lists and my ability to enumerate it:

  • Natural numbers from 1 to 10: forward and backwards
  • ‘a’ to ‘z’: forward
  • Musical notes (‘do’ to a higher ‘do’): forward and backwards
  • Days: forward
  • Months: forward

Speaking about numbers and letters, it’s weird that I can compare 2 numbers blazingly fast but my letter comparing speed is very low. For example, given two numbers like 3 and 8 I can quickly grasp that 3 is smaller. However, given 2 letters like ‘o’ and ‘g’ I need to ponder a while before I can decide that ‘g’ comes before ‘o’. Is this because the set of numeric symbols is smaller (10 compared to 26)? Will backward orientation (practicing enumerating ‘z’ to ‘a’) help?

Word dump: random – 4:57 PM 8/18/2006

29 random words which makes my word count 1372:

Kanji Kana English
しんさ judging
ちょくせつ direct
きんちょう tension
ゆうじょう friendship
ぜったい absolute
入力 にゅうりょく input
英数 えいすう English (ASCII) coding
きねんび holiday
すっぴん face with no make-up
めちゃくちゃ mess
あたり前 あたりまえ usual
当たり前 あたりまえ usual
当り前 あたりまえ usual
かっこいい “cool”
入会 にゅうかい admission
びっくり surprise
おばけ ghost
おくさま madam
たすける to rescue
ぼっちゃん son (of others)
へたくそ extreme clumsiness
しかし however, but
マジ serious (not capricious or flirtatious)
バランス balance
ぶらんこ swing
からす to exhaust
プチ small (fr: petit)
行って来ます いってきます I’m off
からす crow, raven

I’ve exhausted my randow word stock so it’s time to do a topical word hunting (colors, body parts, etc).

Without a digital English-English dictionary etc

2006 August 2

Word Dump: 好きすぎて バカみたい – 8:34 AM 8/2/2006

The current word dump is 好きすぎて バカみたい (Suki Sugite Baka Mitai) by DEF.DIVA (a H!P unit consisting of Abe Natsumi, Goto Maki, Ishikawa Rika, and Matsuura Aya).

Kanji Kana English
みたい -like
ララバイ lullaby
あこがれ yearning
そうしそうあい mutual love
ころ time
まける to lose
ずいぶん extremely
まえ before
こおる to freeze
レンジ stove
かいとう thaw
もどる to turn back
やり直す やりなおす to start over
ねむる to sleep
いじょう more than
めいわく trouble
くちょう tone
さいしょ first
おくり seeing off
下らない くだらない worthless
かえる to go home
うなずく to nod
おわかれ farewell

Stats:

  • Previous “average new words/song”: 19.5
  • New words in this song: 23 (same as previous one)
  • New “average new words/song”: 20.67
  • Total words in word list: 1240

Without a digital English-English dictionary – 8:37 PM 8/1/2006

When I reinstalled my Windows, I backed up Oxford Dictionary’s folder. However running the executable on the new Windows didn’t work because the program seemed to require a library which was unregistered. Therefore I’m now left without a digital dictionary.

PS: Previously I installed the Oxford dictionary (called “Oxford Advanced Genie) from a CD that wasn’t mine. The content is of course superb (you can even hear the sound of each word), however it isn’t free and the user interface is terrible.

OpenOffice.org Writer’s thesaurus isn’t always helpful. For example, giving the thesaurus “gourmet” as an input brings “epicure”, “gastronome”, “bon vivant”, “epicurean”, “foodie”, and “sensualist (general term)” as synomyms. “Sensualist” didn’t seem to fit the context (it was about food). “Foodie” is a term that is absolutely related to food, but what does it mean? A person that likes to eat a lot? No idea. The other words are complete aliens to me.

How about Wiktionary? The entry for “gourmet” is probably there but I can’t access it from home, obviously.

Somehow, I have the installer for linguist (probably copied off a friend). Since it can function as an English to Indonesian dictionary, I installed it. Searching for “gourmet” gives “ahli pencicip makanan”. Nice, but it seems to be a commercial program (which I somehow have illegally obtained) so I’m quite reluctant to use it.

So, currently my legal alternative is a good old printed dictionary (God bless the trees). I currently have “OXFORD Advanced Learner’s DICTIONARY” (4th edition, 1989) which I borrowed from my uncle. “gourmet” gives “person who enjoys and is exper in the choice of fine food, wines, etc”. Nice description.

Another alternative is a “SAT I” book I own. It has a 3500 word list. There, “gourmet” means “connoisseur of food and drink” and “connoisseur” (an alien word) means “person competent to act as a judge of art”. You can’t count on every word being on this book, but when it’s there, you get a nice definition and an example sentence.

Because a paper dictionary is quite a bother to use, I’m going to search for a freely available English to English dictionary database. Something like EDICT, where you can download the database and use any client to view it (or make your own).

Btw, I encountered the word “gourmet” while learning Japanese. In “Hello! Project DVD MAGAZINE volume 7”, the sentence “ゴルメレポーター。。。” popped on the screen:

gorume repootaa

Searching “ゴルメ” (gorume) in EDICT gives “gourmet”, an English word unknown to me (“レポーター” (repootaa) obviously means “reporter”). This is not the first time something like this happened. “curfew”, “stingy”, “sulk”, and “fickle” are some other English words I stumbled upon while learning Japanese. It’s interesting that learning Japanese reveals a lot of my English vocabulary deficiency.

Update: The solution is to use StarDict.

Bad handphone design – 10:41 PM 7/29/2006

On my phone (Samsung something) the disconnect button (the one with the red phone icon) is placed above the “3/def” button and below the “options” button. When writing an SMS, pressing that button will discard the message and bring you straight to the main screen. The message will be lost!

With that design, a user that accidentally presses the button will lose the message. Murphy’s law states that “when something bad can happen, it WILL happen”. It happened to me already around 2 or 3 times, and I was really pissed off when it happened.

I can envision 2 improvements:

  • When the “discard” button is pressed, prompt the user. This behavior can be found anywhere from Notepad (“The text in the “xyz” file has changed. Do you want to save the changes”) to Firefox (“You are about to close “x” open tabs. Are you sure you want to continue?”)
  • Go to the main menu directly, but save the message in the drafts folder. This “no need to save” behavior can be seen in some programs like Tomboy (a great GTK# note taking program).

It’s pretty simple, really. The principle is “don’t let users do disastrous things easily”. Things like this are now taught on a standard Computer Science course, “Human Computer Interaction” (but not if you get my lecturer since he didn’t have a clue what the course is all about).

Moyo Go Studio translation done etc

2006 July 4

BOAB naming convention – 4:13 PM 7/3/2006

Since the “BOAB”, “Another BOAB”, “Yet another BOAB” naming scheme sucks, I decided to change the naming scheme. The naming scheme will be “[The most insteresting BOAB] etc”, for example “I won a 10 million lottery etc”

Moyo Go Studio translation done – 3:30 PM 7/3/2006

I think the translation (784 strings) has passed sufficient quality control to be released to the wild. Some translation and naming issues that were brought to light:

  • How should menu items be capitalized? “Add Games”, “Add games”, “add games”, or, God forbid, “ADD GAMES”? I chose to capitalize the start of every word, except for some particles like “and”. The English strings aren’t consistent in this matter (for example “Add games” vs. “Save Board Capture As”).
  • Which brings us to the next question: What words are immune to capitalization? “and”, “or”, “in”, “on”… What else? I chose to not capitalize “not” (“tidak”). Is it appropriate? There must be a guideline somewhere…
  • If a menu item contains an explanation in parentheses, how should it be capitalized? An example is “Result (fast, inaccurate)”. I chose to capitalize the explanation also. Is it overcapitalizing?
  • What is the criteria to give ellipsis (…) to a menu item? I read on a usability article somewhere that ellipsis should be used on a menu item that prompts the user for more information before being able to execute the intended action. For example, “Save As…” should use ellipsis because the program will prompt the user for the file name before being able to save. “About” shouldn’t use ellipsis because it directly does the intended action which is displaying the about dialog. I follow this guideline. Some English strings violate this guideline, for example “Search” and “About…”. (Almost every Windows program use ellipsis for the “About” menu item)
  • A convention I don’t like is that the English strings append the ellipsis in the msgid. I prefer that the ellipsis be added at run time when building the UI. What problem does the current practise pose? Here’s one example: There is the msgid “Search”, which is used in the “Database” – “Search” menu. It really should be “Search…”, so I translated it as “Cari…”. It turns out that the msgid “Search” is used on places other than the menu (for example on the button on the dialog of “Database” – “Search”) and in those other places the ellipsis isn’t appropriate.
  • How should one capitalize tooltips? “Back up and take other variation” or “Back Up and Take Other Variation”? I prefer to capitalize only the first letter of the first word.
  • Should tooltips end with a dot? “Back up and take other variation” or “Back up and take other variation.”? Things get messy when a tooltip contain 2 sentences. I prefer to omit the dot for single-sentenced tooltips while use dots for multiple-sentenced tooltips. There are inconsistencies in the English Moyo Go Studio (For example “Move trough current variation.” vs. “Board perspective”).
  • The translation of a single English word can be multiple words, in which the problem of capitalization arises again. For example, should “Handicap” be translated as “Batu bantu” or “Batu Bantu”? The translator must check where the string will be used to determine the correct capitalization (by running the program and hunting for the string). A serious problem occurs when the string is used in 2 places with different capitalization requirements.

With those problems, it is impossible to create an Indonesian translation that consistently adheres to a good UI naming guideline. Therefore, some compromises must be made. I chose to make the menu perfect even if that means inconsistencies on other places.

I’ll give an example of the inconsistency that arises. Menu capitalization dictates that it is “Langkah Baik”, not “Langkah baik”. Sadly, a tooltip uses the exact same string. Because tooltips are supposed to use a different capitalization rule, the tooltip violates the rule.

My idea is that words in the msgstr should be lowercased except for proper names (which starts with an uppercase), and then the program will call an appropriate method to build the final string. For example:

string menuString = MakeMenuString(GetString(msgid));
string tooltipString = MakeTooltipString(GetString(msgid));

An alternative is to have multiple msgids for the same word which is used in different places. For example:

msgid "menu-handicap"
msgstr "Handicap"

msgid "tooltip-handicap"
msgstr "Handicap"

I envy languages without uppercase/lowercase mess, like Arabic, Japanese (カタカナはuppercaseじゃない), Korean, and Chinese.

I mentioned many inconsistencies of Moyo Go Studio’s English strings. However, I believe that string consistency is generally an underworked area in many other software projects. For example, Notepad++, the software I use to write this blog entry, also has many inconsistencies in its strings.

Other than string consistency (or the lack of), another area where Moyo Go Studio sucks is in its window layouting system. Widgets in its dialogs have fixed coordinate and size, so problems like this arise:

With the availability of toolkits like GTK, hardcoding coordinates and sizes is oh so outdated. It is as obsolete as hardcoding the time cycle of old games to clock frequecies (try playing Sonic 3 on a modern computer and you’ll see what I mean).

Moyo Go Studio sucks on at least one other thing: It can’t change language on run time (need to restart the program). Even my (+Wijaya and Karnan) LifeSimulator can do it :)! Hmm, as far as I remember, even SmartGo can do it 🙂 (tried the time-limited trial version).

Wait, I found another area where MGS sucks: It doesn’t run on Linux (Frank stated clearly that he’s running a Windows software shop so no chance of this happening soon). It would be cool if MGS has a GUI-less server version mode or a library of its core with usage documentation (easier to port), so other people could write a Linux client.

How good is Moyo Go Studio as a Go Suite? I’ll probably write a full review some time in the future after I’ve become more familiar with the beast and if I have free time. My first impression is that it is a superb Go study tool lacking some UI polish. However don’t let this blog entry discourage you to buy MGS because the problems mentioned probably won’t bother your daily use and because it will certainly be fixed in the future (after the kickass tactical module project?). Moyo Go Studio is definitely worth your $$ (or time, if you plan to translate it)!

UPDATE: I told Frank about this blog entry. Some days later, on July 7, Frank released an update that adds an extra space to the problematic widgets. That workaround made my Indonesian translation look fine.

TK II done! – 12:27 PM 7/3/2006

At last I finished the TK II! I’ve given the report and CD to an Ilkom office staff which will hopefully be passed to Pak Medi.

My TK II was about making SharpJiten, a kanji dictionary written in C# which uses GTK# for the GUI and KANJIDIC for the dictionary. Here’s a screenshot:

Some last remarks:

  • I can’t get GTK# for .NET working. Therefore I opted to bundle Mono in the CD.
  • IconView is a terribly slow widget. Give it 200 items to draw and it will choke. A workaround is to divide the items into separate pages but I didn’t implement it.

A page, which contains SharpJiten’s description, installation method, source code, and TK II report can be found at http://agro.web.ugm.ac.id/sharpjiten.

Since the TK II is done, now I need to finish my other duty: translating Moyo Go Studio.

PS: One flaw in the hardcopy of the report is that page 12 and 13 is flipped!!!

Early month shopping – 9:12 PM 7/2/2006

Phenomenon: There are many people shopping early at the month (1st or 2nd day). Queues are extremely long.

Hypothesis: People just got their salary.

Explorer/Windows file name limitations – 10:17 PM 7/1/2006

In Windows Explorer, we can’t rename a file to “prn”, “con”, “com0” – “com9”, “lpt0” – “lpt9″, and only God knows what else. That’s because in the Windows world (which originated from the DOS world”, those names are special names used to address some devices. For example, “prn” refers to the printer (Which printer? Probably the primary one.) and “con” refers to the console. Turn on your printer (if you have any) and try this command:

copy con prn

You will effectively have a typewriter.

PS: How did I know it? Remembered it from my old DOS times… In those days, tricks like “copy con config.sys” is commonly cited.

That system itself is inferior to Linux. In Linux, devices are files in the /dev directory so you won’t have artificial file name limitations.

But if that’s the limitation of the lower system, I can accept that Explorer don’t allow it. However, what’s confusing is that Explorer won’t give any explanation for this behavior. Instead, when renaming files to one of those names, it will set the file to the previous name.

Next case. NTFS and FAT32 support file names that start with a dot (for example “.bashrc”). As a proof, try:

echo > .test

And a file with the name “.test” will appear.

However, try making a file in Explorer that starts with a dot and Explorer will say “You must type a file name”. Silly.

Another one. NTFS and FAT32 also support file names that start with a space (for example ” foo”), as long as it is followed by a non-space character. As a proof, try:

echo > " foo"

However, when renaming files using Explorer, it will remove the leading space.

I don’t know whether this is the limitation of NTFS and FAT32, but I can’t find a way to make file names that ends with a dot (like “crhsab.”) and file names that ends with a space (like “foo “).

I’ll test all those in ext3. As an evil hack, I’ll try creating the illegal files on a FAT32 partition under Linux >:).

Another BOAB

2006 July 1

Dying Windows – 10:12 PM 6/30/2006

My Windows installation is starting to choke badly. Some of the annoying things:

  • Disk management using GUI is awfully slow. This is using any file manager.
  • Closed programs remain on the process list (in other words, they’re still running.
  • Programs randomly eat 99% CPU cycle. When the program is terminated forcefully, other program foolishly assumed the CPU killing job.

I’ll probably reinstall after finishing the TK2 report.

Morning breeze – 6:51 AM 6/30/2006

The cold morning chills me to the bones. Is that why I fear it so much? I have not felt it for ages so that the pain is almost like an exciting thrill.

The SRT subtitle format – 9:38 PM 6/27/2006

Making a subtitle using the SRT format is very easy. The example below is self-explanatory:

1
00:00:02.849 --> 00:00:05.348
JUMP JUMP take offしようぜ!
JUMP JUMP take off shiyouze!

2
00:00:05.400 --> 00:00:08.321
天使の羽を持っている
tenshi no hane wo motteiru

3
00:00:08.309 --> 00:00:11.600
見上げれば 未来
miagereba mirai

4
00:00:11.750 --> 00:00:14.783
Boys & Girls! Be Ambitious!

Some things to note:

  • The file should be saved using the srt extension. Saving in UTF-8 works.
  • The time format MUST be HH:MM:SS.SSS. For example 00:00:1.2 won’t work (there must be 5 digits for the “seconds” part.

A fine media player that supports the SRT subtitle format is Media Player Classic. Just activate the “Load Subtitle…” menu item from the “File” menu.

IME not restricted to natural languages – 9:07 PM 6/27/2006

When I activate the Japanese IME, I can pop 私 by typing “watashi”. This works anywhere: From typing on a text editor to renaming files on Explorer. I often imagine how easy computing life would be when the use of IME is not restricted to languages.

Here are some examples:

  • Typing “date” would yield the current date, for example “2006-06-27”
  • Typing “time” would yield the current time, for example “21:12”
  • Typing an equation would yield the result, for example “=2*3” would yield “6”

If using an IME layer is overkill, a system wide shortcut key should be available to pop up stuffs that don’t require user input (like date and time)

Another BOAB

2006 June 13

Lazy programmer – 6/12/2006 10:44:41 PM

Once I saw an upperclassman editing a very large text file. It turned out that he was removing unwanted lines. The unwanted lines had a clearly defined pattern, but instead of making a program to remove it, he scanned the lines one by one.

He could program, but didn't use his programming skill. Sure, we don't need to make a program for every imaginable trivial task. However the text file which he edited was so large that the benefit of making the program would clearly outweight the cost of making it. A lazy programmer is not a programmer.

There are lots of small disposable programs on my disk. Some examples are:

  • A program to append a string to all file names on a folder (I once needed to rename hundreds of file)
  • A program to modify each line of a text file in various ways in which new modifiers can be easily added (the need to manipulate strings in a text per line happens really often)
  • A program to correct broken links created by a stupid program called WebCopier

That is because when I am faced with a repetitive task which can easily be programmed, I don't hesitate to make the program. I'm a strong believer that uncreative jobs are better left to machines, while humans better spend their time for creative endeavours.

SharpJiten 1.0 RC – 6/12/2006 10:12:01 PM


The program is essentially complete. On the screenshot you can see SharpJiten set to synchronize itself with the clipboard and additionaly only displays grade 1-3 kanji. When a bunch of kanji is copied from a Wikipedia article, the program updates itself and displays the relevant kanji.

What remains is tyding the code (throwing unused commented code, for example) and modifying the program to use an internal (embedded) EDICT instead of a separate EDICT file. Oh, and some days of bug testing to make sure that nothing silly happens.

SharpJiten is a memory hog – 6/12/2006 8:42:02 PM

SharpJiten is the name of the kanji dictionary program I'm building.

On the previous build, SharpJiten only loads some essential kanji info from EDICT. Those fields are the kanji itself in Unicode, (Japanese) readings, English meaning, stroke count, and grade info. This results on a memory footprint of around 26 MB.

However, EDICT has additional fields, more than what you can imagine. Some examples are SKIP code, various printed dictionaries' index, and Korean reading. When all fields are loaded, the memory footprint blows to 40 MB. As a comparison, Wakan requires around 37 MB and JquickTrans 14 MB. The memory usage of JquickTrans is amazing, considering that it is both a kanji and word dictionary.

The EDICT specification specifically states that any fields can be added at a later specification. SharpJiten will have a custom filter in which the user can specify arbitrary field to filter. This makes SharpJiten forward compatible with future versions of EDICT.

About the memory usage, I won't fuss over it. Deadline is approaching (when is it anyway?) and my priority is to have the program working. The only feature not implemented is the arbitrary field filter thing.

Yomiuri Meijin – 6/12/2006 5:38:01 PM

The Meijin is historically a title for the strongest Go player on Japan. Then the title was transformed into a prestigious tournament sponsored by Yomiuri Shimbun. The sponsorship was later took by Asahi Shimbun.

I have quite a lot of games from the old (Yomiuri) Meijin tournament. Here's the search result on the event tag:

Meijin (Yomiuri): 412
Old Meijin: 101

But of course there are some inapproriately/ambigously labeled SGFs:
9th meijin 1970: 2
meijin (yomirui): 1
meijin (yomuri): 1
1st meijin 1962 league: 3
4th meijin title match: 2
9th meijin 1970: 2
5th meijin title match: 1

Those labels are found by searching for "mejin" with the date 1962-1975.

So the total is at least 525.

Update: The total is not 525 but 523. Try to spot a mistake on the above BOAB.

sgf2tex – 6/12/2006 1:40:17 PM

I'm currently adopting Minue622's tutorial on Haeng-ma to Indonesian. While going to add a pro game example which is lacking in the discussion about iron pillar, I thought it was going to be really painful entering the move coordinates one by one to LaTeX. Therefore I created sgf2tex:


Here's a sample output:

cleargoban
black{q16,q4,q10,p15,q6,r3,r2,m4,l4,n4,o5,n16,r16,k4,j5,h5,g6,r11,r12,p6,p10,p9,g7,g8,g9,g10}
white{d4,d16,o17,o4,q3,p3,k3,m3,l3,n3,g16,r17,q17,j4,h4,g5,r10,q11,q7,q9,s9,f6,f7,f8,f9}
begin{center}
showfullgoban
{comment}
end{center}

The annoying thing is that the SGF format uses a different coordinate system than what most Go players (including client) are used to. Conventionally, the rows are numbered, starting from the bottom, 1 to 19 while the columns are labeled, from left to right, a to t. The letter i is ommited to avoid the confusion between capital i (I) and lower case l (as is "lamp"). Compare I to l and you will see that it really looks similar. The SGF format labels the rows from the top and using letters from a to s. The columns are labeled from left to right using letters a to s.

pdflatex bug – 6/12/2006 1:28:19 PM

Creating Go diagrams using the igo.sty LaTeX package and then compiling the pdf using pdflatex produces wriggly lines:


This is probably a bug on the very old pdflatex that I use, so I tried the more orthodox and roundabout method.

The first is to create the dvi using latex, then create the ps using dvips, the finally creating the pdf using ps2pdf. To my dismay, wriggly lines still exist!


You can see that the lines near the left egde are wriggly.

What works well is converting the ps to pdf using GSView:


Indexing is slow – 6/11/2006 1:29:01 AM

Extracted the 40 thousands or so games from MGS's web site. Then added those games into Kombilo. The whole indexing process took almost 2 hours on my supposedly-powerful 1.9 GHz processor!

Anyway, I did a search on games involving a historical Meijin. Here are the Meijins (from first to last) and the number of games in my new database:

Honinbo Sansa – 0 (the number of games in my old database is 0)
Inoue Nakamura Doseki – 9 (0)
Yasui Sanchi – 126 (8)
Honinbo Dosaku – 61 (37)
Inoue Dosetsu Inseki – 15 (0)
Honinbo Dochi – 25 (0)
Honinbo Satsugen – 13 (0)
Honinbo Jowa – 64 (0)
Honinbo Shuei – 227 (0)
Honinbo Shusai – 499 (3)

The new database is clearly superb!!!

The number of games on my old database is only around 7 thousands. An interesting obersvation is that searching for sanrensei on both databases takes the same amount of time (1.8 seconds).

Mnemosyne 0.9.4 – 6/10/2006 9:31:34 PM

Mnemosyne is a flash card program. You give it a set of question and answers, and the program will schedule those questions for you. In other words, it manages the process of memorizing lots of items. It is a must for anyone learning a natural language (English, Japanese, Arabic, etc) where thousands of items (vocabulary) must be memorized.

I replaced my aging 0.9.2 with 0.9.4. Nothing spectacular, just bug fixes for things that don't affect me. For the upgrade process, I backed up the .mnemosyne folder (Windows Explorer won't let you create files/folder starting with a dot btw), exported my 0.9.2 Mnemosyne data to an XML file, uninstalled 0.9.2, installed 0.9.4, and finally exported the XML file. Everything went smooth.

Deletion of the contents of boab.txt – 6/10/2006 9:28:13 PM

On my hard disk, a BOAB is stored on the file boab.txt. I will then bring the text file when I surf the net and post the contents to my blog. I've decided that after I post it, I'll empty the contents of boab.txt on my harddisk.

It would be great if my offline BOAB is stored on a wiki. The full history of the file will then be available. I'll try to install MoinMoin Wiki later on Dapper (MoinMoin Wiki is the wiki engine used for Ubuntu's wiki; Wikipedia uses MediaWiki).

BOAB

2006 June 10

Don't know what a BOAB is? Read the last entry on this entry (hint, hint).

The numbering of free software projects – 6/9/2006 10:56:57 PM

I always thought version numbers as a decimal number. For example, 5.1 is less than 5.11 which is less than 5.2. So, if those 3 numbers are software versions, the idea that sprang on my mind is "5.1 is released first, then 5.11, then 5.2".

Of course I was shocked to know that free software projects (GNOME, Linux kernel, etc) don't think it that way. They reasoned that 1 is less than 2 whin is less than 11, so 5.1 comes before 5.2 which comes before 5.11.

Now, after getting used to the free software universe, I have come to like it more than my previous preference. You just need to change your point of view. Think of a book and think of the number before the dot as the chapter number while the number after the dot as a section number. For example:

"5.1" is analogous to "chapter 5 section 1"
"5.2" is analogous to "chapter 5 section 2"
"5.11" is analogous to "chapter 5 section 11"

Thinking that way, the sorting will make sense. When multiple dots are present (as in x.y.z.w), think about subsections and subsubsections.

You won't feel nervous again thinking that GNOME 2.12 is newer than GNOME 2.2.

My next computer stats – 6/9/2006 10:32:03 PM

RAM: At least 1GB (currently 512 MB). My current RAM is very insufficient, since on a normal session I open some IDEs (Visual Studio 2005 Express, SharpDevelop 1.1), some SDK documentations (.NET SDK Documentation, Monodoc), a media player, Firefox (browsers can be a crazy memory hog), dictionaries (English, Japanese), and still other programs (Go client, Go database, file explorer, image editor, etc). At this very moment, task manager indicates that 693 MB of memory is used. 512 MB isn't enough, Q.E.D.

Processor: AMD dual-core processor (currently AMD single-core). Programs are starting to get multithreaded, and it's thrilling to experience the speedup on a hardware that can actually run 2 threads at the same time. I'm especially watchful to the development of the multithreaded Go program Moyo Go Studio. Why AMD? Well, why Intel?

Hard disk: RAID 0 configuration (currently no RAID). Who likes to be bottlenecked by that sluggish piece of hardware? Oh, and the capacity should be at least 200 GB (2 x 100 GB).

Video Card: NVIDIA (currently NVIDIA). ATI is notorious for its bad driver support on Linux.

Monitor: LCD which supports 1280×1024 and at least 17 inch (currently CRT, 1024×768, probably 14 inch). My eye is tortured by looking at the monitor n hours a day. Switching from CRT to LCD should ease the pain. A larger resolutin support is needed because complex programs like IDE and Go database client has panels everywhere. A small resolution leaves an uncomfortably small space for the main panel. 17 inch is needed so that things at a large resolution won't look tiny (should be irrelevant on a vector graphic era).

Fan: A silent but powerful fan (currently noisy). The sound pollution emitted from my fan is unbearable. A computer is not a Harley. Noisy is not cool (pun intended).

Writer: DVD writer (currently CD Writer). DVD-Rs (the media) are insanely cheap right now. IIRC, with Rp. 4k you can get 4 GB storage. The perfect solution to backup trashes.

A powerful editor lacking 1 feature – 6/9/2006 10:23:24 PM

ConTEXT (why the crazy capitalization?) is my text editor of choice on Windows. It supports tabs, syntax highlighting, and most importantly user-defined actions. User-defined actions means that we can bind some keys (for example F9) to a shell command (for example to compile the file). This makes ConTEXT effectively a bare IDE for any task you can imagine.However, ConTEXT doesn't support Unicode. Yes, typing "watashi" on the IME yields ‚킽‚µ (hiragana) or Ž„ (kanji) on ConTEXT. That is pretty dumb, considering that it's now on 2006 and the idea of i18n (internationalization: i-(18 middle characters)-n) isn't anything new. I'll file a bug report on it.

BOAB: a neology – 6/9/2006 10:18:57 PM

BOAB stands for "blog on a blog", and this entry is an example of a BOAB. The idea is to blog anything interesting directly at home, and then uploading all offline blogs on 1 online blog entry. Since an offline blog entries will be a part of 1 online blog entry, it is a blog on a blog or BOAB.

The acronym BOAB doesn't come out of thin air. See FOAF.