[Info-vax] 8-bit characters
Arne Vajhøj
arne at vajhoej.dk
Thu Nov 11 11:21:42 EST 2021
On 11/10/2021 11:48 PM, Lawrence D’Oliveiro wrote:
> On Thursday, November 11, 2021 at 3:33:33 PM UTC+13, Arne Vajhøj wrote:
>> The biggest problems with UTF-8 is that the byte length is not
>> necessarily the character length ...
>
> That would be true of any Unicode encoding, even UCS-4.
No.
It is a practical problem in UTF-8 as everything not in ASCII is more
than 1 byte.
It is a theoretical problem in UTF-16 because there are defined unicode
code points that become more than 2 bytes (they are just extremely
rare).
It is not a problem for UTF-32 as everything is 4 bytes.
Arne
More information about the Info-vax
mailing list