2005-06-11

Unicode: Chillu: ZWJ shouldn't be overloaded for searching

Software Engineering Pragmatism

I have been in software industry for last 10 years. I know, as many of you do, this: If one entity does not have philosophical integrity, soon its minor functional variations will be interpreted differently or ignored by the developers and eventually it will be a perpetual bug in most of the software applications today or to come. You all can imagine, how much priority this Malayalam bug will get in each company's defect tracking.

To tell an example, couple of months back we faced with an issue of Unicode malayalam text getting truncated in Microsoft Outlook. After some google searches we stumbled upon this piece of information: http://www.landfield.com/usefor/1997/Aug/0142.html. Undoubtedly that would the attitude development process will take towards this special case.

I can clearly understand UTC's fondness towards ZWJ solution - it is just a one line change in the standard and chillus are history or UTC. But for Malayalam community, life with Unicode is just begun. The potential bugs in the software tools and usage diffculties (which Kevin explained) will haunt us for decades. So I would ask all to think twice before committing to overload ZWJ with language specific functions.

I still wonder what Arabic community was thinking when they allowed this to happen for them. Taking Arabic as an example, soon ZWJ will get overloaded many such language specific functions and this simple ZWJ will turn into the most unscrupulous codepoint in entire standard.

No comments:

Post a Comment