About Simplified & Traditional Chinese_Shanghai Translation Company
E-ging Solutions is one of the largest Shanghai translation companies .if you’d like to know more about how we can help you, please don’t hesitate to contact us via through our website.
What's the difference between Simplified & Traditional Chinese, and are they separate in unicode?
Is it correct that simplified and traditional Chinese are not completely separate sets of code entries in Unicode?
If so, are they simply like two different fonts for the same Unicode point?
Would I have to have a simplified and a traditional font installed?
One traditional character may correspond to several simplified ones, right?
What is Simplified Chinese?
Variant forms of a given Chinese character have developed over time. For example Japanese has many simplified forms. The number of Chinese characters kept growing too. In the 1950's, Mainland China decided to reform the Chinese writing system. They simplified the shapes of many of the more common characters in use. For example, they chose the same form of 'country' as used in Japanese to replace the previous form. However, not all the simplifications adopted were simply taken from existing variants.
The simplification process also simplified certain components that occur in many characters.
Simplification also attempted to define a relatively smaller set of characters for common usage than had traditionally been the case. In many cases, this meant that a single character from the simplified set was used in place of several characters from the larger traditional set.
Traditional Chinese is still used to write characters in Taiwan and Hong Kong, and much of the Chinese diaspora. Simplified Chinese is used in Mainland China and Singapore. It is important to stress that people speaking many different, often mutually unintelligible, Chinese dialects would use one or other of these scripts to write Chinese – ie. the characters do not necessarily represent the sounds. There are also a few local characters, such as for Cantonese in Hong Kong, that are not in widespread use.
han unification in unicode
Next we turn to how these characters are encoded in Unicode, and we have to start with a short word about 'unification' in Unicode.
Unicode provides a superset of most character sets in use around the world, but tries not to duplicate characters unnecessarily. For example, there are several ISO character sets in the 8859 series that all duplicate the ASCII characters. Unicode doesn't have as many codes for the letter 'a' as there are character sets - that would be ridiculous. The same principle applies for Han (Chinese) characters. The initial set of sources for Han encoding in Unicode laid end to end comprised 121,000 characters, but there were many repeats, and the final Unicode tally for all these after elimination of duplicates was 20,902.
(It is said that Chinese people typically use around 3-4,000 characters for most communication, but a reasonable word processor would need to support at least 10,000. Unicode now supports over 70,000 Han characters.)
If Han characters had different meanings or etymologies, they were not unified in Unicode. Han characters, however, are highly pictorial in nature. So the (dis-)unification process had to take into account the visual forms to some extent. Where there was a significant visual difference between Han characters that represented the same thing they were allotted to separate Unicode codepoints. (This was a pretty sophisticated process, in fact, carried out over a long period by many East-Asian experts.)
What is left for unification are characters representing the same thing but exhibiting no visual differences, or relatively minor differences such as different sequence for writing strokes, differences in stroke overshoot and protrusion, differences in contact and bend of strokes, differences in accent and termination of strokes, etc.
The codes that remained after unification were all lumped together and sorted by 'radical'. A radical is one of 214 named character components. Nearly all Han characters include one of these radicals. (This is a very simplified view, but I don't think it's necessary to bore you with all the gory details. Also as more characters were added to the initial 20,000, new characters were stored in different areas of the code space. But lets keep this simple for now.)
Simplified vs. Traditional character sets
So now, coming back to the question about simplified vs. traditional character sets...
The Chinese national GB standard defines a basic set of (around 6,000) characters for use with Simplified Chinese writing that does not include many of the characters in the Taiwanese industry standard for Traditional Chinese called Big 5 (around 13,000 characters in the basic set). Unicode is however a superset of both with all duplication removed down to the level of detail described above.
So the characters for 'country' in Simplified and Traditional Chinese, are stored as separate codes and you cannot simply switch between the two by using a different font. On the other hand, the character for 'the world' in both Simplied and Traditional writing looks the same, and both writing systems do share the same code point. Then there are characters which share the code point because they are not significantly different in appearance, but may typically exhibit systematic differences in stroke overshoot and rotation of minor strokes between simplified and traditional writing systems. To see these correctly you need to apply the right font, eg. a Song font for simplified and a Ming font for traditional.