[Info-vax] Does OpenVMS Use Unicode?

Johnny Billquist bqt at softjar.se
Tue Jun 14 14:08:25 EDT 2016


On 2016-06-14 13:28, Neil Rieck wrote:
> Not wanting to engage in a flame war, the following quote from a popular web site says it all:
>
> The original specification covered numbers up to 31 bits (the original limit of the Universal Character Set). In November 2003, UTF-8 was restricted by RFC 3629 to end at U+10FFFF, in order to match the constraints of the UTF-16 character encoding. This removed all five- and six-byte sequences, and almost half the four-byte sequences.

Where did you get that from? Unicode started out as a 16-bit character set.

Unicode was expanded beyond 16 bits only in 1996 by Unicode 2.0.

Are you confusing Unicode with ISO 10646 perhaps?

Oh, you might actually be reading the page on UTF-8. Maybe it's worth 
repeating: UTF-8 is an encoding scheme. The character set is Unicode.

> ###
>
> This restricts UTF-8 (which is a unicode encoding) to a subset of the entire unicode map. BTW, there are large holes (called planes) in the unicode map which allow for future growth. But new codes will not appear in UTF-8 unless RFC-3629 is superseded.

But the Unicode "map" only covers 0x0 to 0x10FFFF anyway, so it's not a 
subset. It's just that UTF-8 was "trimmed" to just cover what was needed 
to encode all Unicode code points.

	Johnny




More information about the Info-vax mailing list