[Info-vax] 8-bit characters
Lawrence D’Oliveiro
lawrencedo99 at gmail.com
Thu Nov 11 18:01:06 EST 2021
On Friday, November 12, 2021 at 11:45:39 AM UTC+13, I wrote:
> On Friday, November 12, 2021 at 7:53:21 AM UTC+13, Arne Vajhøj wrote:
>> <quote>
>> Each Unicode code point is represented directly by a single 32-bit
>> code unit. Because of this, UTF-32 has a one-to-one relationship
>> between encoded character and code unit; it is a fixed-width character
>> encoding form.
>> </quote>
>
> Beware of terminology! What a normal person might call a “character”, they call a “text
> element”. This is represented by one or more of what they are calling an “encoded character”.
Actually, the term “text element” is less specific than that. More accurate terms, according to <https://www.unicode.org/reports/tr29/tr29-39.html>, would be “user-perceived character” or “grapheme cluster”.
More information about the Info-vax
mailing list