Home | English | Chinese(Simplified) | Chinese(Traditional) | Wap

WebMaster Online Tools


Unicode has the explicit aim of transcending the limitations of traditional character encodings, such as those defined by the ISO 8859 standard, which find wide usage in various countries of the world but remain largely incompatible with each other. Many traditional character encodings share a common problem in that they allow bilingual computer processing (usually using Latin characters and the local script) but not multilingual computer processing (computer processing of arbitrary scripts mixed with each other).

Unicode, in intent, encodes the underlying characters — graphemes and grapheme-like units — rather than the variant glyphs (renderings) for such characters. In the case of Chinese characters, this sometimes leads to controversies over distinguishing the underlying character from its variant glyphs (see Han unification).

In text processing, Unicode takes the role of providing a unique code point — a number, not a glyph — for each character. In other words, Unicode represents a character in an abstract way and leaves the visual rendering (size, shape, font or style) to other software, such as a web browser or word processor. This simple aim becomes complicated, however, by concessions made by Unicode's designers in the hope of encouraging a more rapid adoption of Unicode.

The first 256 code points were made identical to the content of ISO 8859-1 so as to make it trivial to convert existing western text. Many essentially identical characters were encoded multiple times at different code points to preserve distinctions used by legacy encodings and therefore allow conversion from those encodings to Unicode (and back) without losing any information. For example, the "fullwidth forms" section of code points encompasses a full Latin alphabet that is separate from the main Latin alphabet section. In Chinese, Japanese and Korean (CJK) fonts, these characters are rendered at the same width as CJK ideographs rather than at half the width. For other examples, see Duplicate characters in Unicode.