2005-07-06

Unicode: Uniqueness Rule

Consider this scenario:



Two encodings are equivalant if they differ only joiners. Encodings of 'ശബ്ദം' and 'ശബ്‌ദം' are equivalant because they are be different only by a ZWNJ. Meanwhile, encodings of 'അവൻ' and 'അവന്' are not because one has chillu letter, other hasn't.

Rendering equivalance is more of a subjuctive thing. We know 'ശബ്ദം' and 'ശബ്‌ദം' are equivalant renderings. 'അവൻ' and 'അവന്' are not equivalant renderings. 'അല്പം' and 'അൽ‌പം' are sometimes considered equivalant, sometimes not. These pairs can not participate in this rule.

The fonts can vary from the new orthography font Nila from Bhasha Instituite to old orthography fonts like Anjali or Rachana. I don't think it is realistic to consider fonts which don't have lesser number of conjuncts than Nila.

Uniqueness Rule says:

If there is Encoding Equivalance then there should be Rendering Equivalance. (see details on why this is required)
Also, if there is Rendering Equivalance then there should be Encoding Equivalance. (see details on why this is required)

We can consider two versions of this rule. In the lenient version of this rule, at least one of the renderings should be valid. A rendering is valid when it is present in the dictionary or it is a word combination obeying grammar rules.

In the aggressive version of this rule, we consider all possible words, even those outside dictionary. This could be useful because these words can come from:
  1. Colloquial phrases, often found in novels and stories.
  2. Names of places, people etc.
  3. Future words

No comments:

Post a Comment