[Info-vax] Character sets

Thu Sep 8 19:22:05 EDT 2022

On 2022-09-07 23:19:56 +0000, Arne Vajhj said:

> On 9/7/2022 9:08 AM, Johnny Billquist wrote:
>> On 2022-09-06 20:42, Arne Vajhøj wrote:
>>> On 9/3/2022 3:30 PM, Stephen Hoffman wrote:
>>>> Pedant notes: yes, I do know about wchar_t and friends in C and C++, 
>>>> which is... a mess, and is also ill-suited for UTF-8.  Probably better 
>>>> to use char16_t and char32_t, if you do need fixed-width wide character 
>>>> storage.
>>> 
>>> wchar_t is a typical C vague definition where char16_t and char32_t are 
>>> much more clearly defined.
>> 
>> wchar_t was an invention from before Unicode came about. And it's 
>> fairly incompatible with the ideas in Unicode.
> 
> It is crazy vague in the C standard.
> 
> But on common platforms it is just utf-16 or utf-32.

Maybe I was unclear.

C string handling is bad.

C UTF-8 handling is worse.

As defined, wchar_t is... less than useful.

Sure, if I want piles of glue code, it can be sorta workable. Kinda. Maybe.

But I have OpenVMS for excess glue code.

And OpenVMS UTF-8 handling is ~negligible.

Including BASIC, Fortran, COBOL, Pascal, and the entirety of the 
OpenVMS system APIs, past ODS-5 UTF-8.

While a step or three in the right direction, Python, Perl, and Java 
won't help with OpenVMS here, either.

Not on OpenVMS.

-- 
Pure Personal Opinion | HoffmanLabs LLC