Unicode

Unicode an international standard for encoding text is, with the aim all of arranging character which necessary is each written write down man language. In contrast to ASCII, that only the standard characters from without priorities know, and Latin-1 that much are used , can Unicode ensure that programmatuur with different written languages can go, also when those different languages in a single document are used.

Contents

Context

The initiative for Unicode-codering came of a number of organisations which wants order scoops in the chaos of codings for several character types. As from version 1.1 of Unicode these follow exactly the standard ISO-10464.

In ASCII - and ISO-8859- codings to which belong Latin-1, as well as in codepages of characters of a language each are stored in one byte. This offers space for 28 = 256 different characters. In practice is this there less, because these codings reserve also characters for special applications (such as the spatie and rule course).

For many other schrijfsystemen, such as Chinese, and , are 256 characters not sufficient. For these vaster karaktersets other codings which clear more space by character, are traditionally already used.

Even with all these different codings for the different languages were it not yet possible concerning a strange language write. When one in a Chinese text concerning wild do not write, were possible that in standard Chinese coding: the Arab characters have not been taken in this. Unicode and for that available codings offer problems outcome to this type for all.

Advancement

The first version of Unicode offered space to 65.536 (= 216) several character signs. Of it were there initial about 40.000 occupies. 65536 be able classify signs signs is however too a little in all world languages. Thus it exists on itself already from about 25.000 signs. For this reason the number of possible character signs in Unicode-standaard have been later extended to about one million (220 + 212 - 1; in base 16 is that 0x10FFFF)

It is not possible each Unicode encode character sign in one some byte, or even in two. Unicode-standaard do not discharge this problem on: so-called coding are Unicode-standaard no part of the current version of. However, there be able beat Unicode-karaktersets in serials the bytes standards it has been considered. The most obvious methods are UCS-2 or UCS-4, where two respectively 4 bytes by sign are used. Because this works at texts in our well-known Roman alphabet ruimteverspillend and moreover everyone incompatibel with the past codings which intend of this less charge, have been considered, this to be and UTF-16; in a next version of Unicode-standaard these codings are incorporated possible. Also exists there very tight UTF-7 coding, which will continue exist certainly as a loose standard, but only restricted knows applications. UTF-8 has as a large advantage which it 100% is compatible with 7-bit ASCII, all other character are differently stored than in other codings.

Support

At the moment Unicode offer support for each of languages mentioned below:


See also

External left

 

  > Dutch to English > nl.wikipedia.org (Machine translated into English)