[Info-vax] 8-bit characters

Arne Vajhøj arne at vajhoej.dk
Thu Nov 11 19:21:38 EST 2021


On 11/11/2021 4:57 PM, Lawrence D’Oliveiro wrote:
> On Friday, November 12, 2021 at 5:21:48 AM UTC+13, Arne Vajhøj wrote:
>> On 11/10/2021 11:48 PM, Lawrence D’Oliveiro wrote:
>>> On Thursday, November 11, 2021 at 3:33:33 PM UTC+13, Arne Vajhøj wrote:
>>>> The biggest problems with UTF-8 is that the byte length is not
>>>> necessarily the character length ...
>>>
>>> That would be true of any Unicode encoding, even UCS-4.
>>
>> No.
> 
> You didn’t know, then, that what Unicode codes define are not characters, but code points?

Nonsense.

<quote>
The Unicode Standard specifies a numeric value (code point) and a name
for each of its characters.
...
Unicode characters are represented in one of three encoding forms: a
32-bit form (UTF-32), a 16-bit form (UTF-16), and an 8-bit form (UTF-8).
</quote>

Arne





More information about the Info-vax mailing list