if it were...: Unicode: Glyphs, Graphemes and Phonemes Primer

2005-07-02

Unicode: Glyphs, Graphemes and Phonemes Primer

Please read the definitions of grapheme(ലിപി), phoneme(വർണ്ണം) and glyph(രൂപം) from Wikipedia.org.

I use grapheme as a synonym for character and font as a synonym for the collection of glyphs.
As an example, ക and ഷ are basic characters, that is, they are graphemes. However ക്ഷ is not a grapheme because, it is the combination of ക and ഷ. Therefore, ക്ഷ is just a glyph. However, we need to represent, ക്ഷ as a separate symbol, which is graphically different from ക and ഷ. This symbol should be put in a font, which is the collection of glyphs and not just the character set. In this example, {ക, ഷ} constitute the character set and {ക, ഷ, ക്ഷ} constitute the font.

The real hindrance in understanding this concept is that we compare Asian languages with Latin languages. In English, there is one-to-one correspondence between a character and its glyph. However, in Asian scripts a character can exist in different graphical forms. E.g.: the character Malayalam ര has different graphical forms in following words: മരം, ബ്രഹ്മം, വർണ്ണം. The grapheme ര produces three different glyphs on three different contexts.

Now about phonemes... Set of phonemes is not the character set. For example, we have two ന in pronunciation. However, have only one grapheme to represent them. Instead, Tamil has two different graphemes (characters) to represent them. Simillar is the case with half-റ്റ and റ. They have same grapheme but different phonemes. Consider examples: എന്റെ, പാറ്റ, പാറ.

Unicode is trying to encode the graphemes(ലിപിമാല/അക്ഷരമാല) and not the phonemes(വർണ്ണമാല).

if it were...

2005-07-02

Unicode: Glyphs, Graphemes and Phonemes Primer

No comments:

Post a Comment