[Info-vax] 8-bit characters

Arne Vajhøj arne at vajhoej.dk
Thu Nov 11 11:21:42 EST 2021


On 11/10/2021 11:48 PM, Lawrence D’Oliveiro wrote:
> On Thursday, November 11, 2021 at 3:33:33 PM UTC+13, Arne Vajhøj wrote:
>> The biggest problems with UTF-8 is that the byte length is not
>> necessarily the character length ...
> 
> That would be true of any Unicode encoding, even UCS-4.

No.

It is a practical problem in UTF-8 as everything not in ASCII is more 
than 1 byte.

It is a theoretical problem in UTF-16 because there are defined unicode
code points that become more than 2 bytes (they are just extremely
rare).

It is not a problem for UTF-32 as everything is 4 bytes.

Arne





More information about the Info-vax mailing list