jump to navigation

Howto: Convert edict to HTML 3 October, 2008

Posted by aronzak in Uncategorized.
Tags: , , , , , ,
trackback

If you’ve ever searched for Japanese language software, chances are you’ll have heard of Jim Breen’s edict package. It is a collection of electronic Japanese dictionary systems all bundled into one. Unfortunately, you may find constraints within software designed to use the package. Edict uses a character encoding EUC-JP, which will give you bogus characters if you try and open it as Unicode (the default). Here’s a simple way to convert the edict, along with the kanjidic (Kanji (Japanese pictorial characters originally from China) dictionary) and compdic (computer terms) using Open Office.

1. Edict uses the Euc JP encoding, not Unicode or ISO Western. Open the file using the file type ‘Text Encoded (OO.o Master Document)’. In Debian this is located in /usr/share/edict/

You must use 'text encoded'

You must use 'Text Encoded Master Document'

2. That will give you settings for importing it. Choose EUC-JP, not Unicode.

Tada! Now it opens and displays hiragana, katakana and kanji correctly.

3. Now click on ‘Export’

4. Choose HTML, or another format. You can save it as ordinary text, then edit to your will. Have fun.

Advertisements

Comments»

No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: