Archive for the ‘HTML’ Category

HtmlKanjiMarker: my red grades

2007 February 17


This program was actually made quite some time ago, but I haven’t blogged about it.

HtmlKanjiMarker reads a local HTML file and then marks all unknown kanji red. The list of known kanji is taken from two sources. First is from the “Max grade” textbox on the upper right. I entered 4 because I’ve studied all Jouyou kanji grade 4 and below. The second is from a text file, “ExtraKnownKanji.txt”. The file should contain all kanji you’ve learned, outside from the textbox range.

Using this program, I can visually see how effective my current kanji knowledge is for a certain page. It also makes hunting new kanji easy. Last, It can answer questions such as “what if I learn all grade 5 kanji?”. (just change the “Max grade” textbox)

Programming the algorithm naively yielded a very slow marking. This is because a HTML page contains tons of characters, and there are ten thousands of kanji to check againts. I actually benchmarked and overhauled the algorithm several times. I originally wanted to write about the algorithm changes, but lost the interest by now :).

So, here’s some generated Wikipedia pages viewed from my eyes of 1249 kanji: Newton, September 11 2001 attacks, Wikipedia. Rest assured, I’m still quite far for literacy…

Keep running, and if tired, walking. A small rest is also fine, just don’t surrender!