[Info-vax] 8-bit characters

Lawrence D’Oliveiro lawrencedo99 at gmail.com
Thu Nov 11 18:01:06 EST 2021


On Friday, November 12, 2021 at 11:45:39 AM UTC+13, I wrote:
> On Friday, November 12, 2021 at 7:53:21 AM UTC+13, Arne Vajhøj wrote: 
>> <quote> 
>> Each Unicode code point is represented directly by a single 32-bit 
>> code unit. Because of this, UTF-32 has a one-to-one relationship 
>> between encoded character and code unit; it is a fixed-width character 
>> encoding form. 
>> </quote>
>
> Beware of terminology! What a normal person might call a “character”, they call a “text
> element”. This is represented by one or more of what they are calling an “encoded character”. 

Actually, the term “text element” is less specific than that. More accurate terms, according to <https://www.unicode.org/reports/tr29/tr29-39.html>, would be “user-perceived character” or “grapheme cluster”.



More information about the Info-vax mailing list