[Info-vax] Character sets
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Thu Sep 8 19:22:05 EDT 2022
On 2022-09-07 23:19:56 +0000, Arne Vajhj said:
> On 9/7/2022 9:08 AM, Johnny Billquist wrote:
>> On 2022-09-06 20:42, Arne Vajhøj wrote:
>>> On 9/3/2022 3:30 PM, Stephen Hoffman wrote:
>>>> Pedant notes: yes, I do know about wchar_t and friends in C and C++,
>>>> which is... a mess, and is also ill-suited for UTF-8. Probably better
>>>> to use char16_t and char32_t, if you do need fixed-width wide character
>>>> storage.
>>>
>>> wchar_t is a typical C vague definition where char16_t and char32_t are
>>> much more clearly defined.
>>
>> wchar_t was an invention from before Unicode came about. And it's
>> fairly incompatible with the ideas in Unicode.
>
> It is crazy vague in the C standard.
>
> But on common platforms it is just utf-16 or utf-32.
Maybe I was unclear.
C string handling is bad.
C UTF-8 handling is worse.
As defined, wchar_t is... less than useful.
Sure, if I want piles of glue code, it can be sorta workable. Kinda. Maybe.
But I have OpenVMS for excess glue code.
And OpenVMS UTF-8 handling is ~negligible.
Including BASIC, Fortran, COBOL, Pascal, and the entirety of the
OpenVMS system APIs, past ODS-5 UTF-8.
While a step or three in the right direction, Python, Perl, and Java
won't help with OpenVMS here, either.
Not on OpenVMS.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list