[Info-vax] 8-bit characters
Lawrence D’Oliveiro
lawrencedo99 at gmail.com
Thu Nov 11 17:02:56 EST 2021
On Friday, November 12, 2021 at 7:17:58 AM UTC+13, Craig A. Berry wrote:
> Back when it was called UCS-4, I think that was true.
It was never true. In Unicode, a “character” consists of a base code point followed by any number of combining code points. Some combinations may have their own assigned code point; many don’t.
> But even if the
> encoding is not varying width, the number of characters displayed might
> not match the number of code points because of things like combining
> characters.
And it is worth keeping the distinction between “code points” and “characters” in mind, for this reason.
More information about the Info-vax
mailing list