[Info-vax] Character sets
Arne Vajhøj
arne at vajhoej.dk
Tue Sep 6 19:32:46 EDT 2022
On 9/6/2022 4:31 PM, Stephen Hoffman wrote:
> On 2022-09-06 18:42:53 +0000, Arne Vajhj said:
>
>> On 9/3/2022 3:30 PM, Stephen Hoffman wrote:
>>> Pedant notes: yes, I do know about wchar_t and friends in C and C++,
>>> which is... a mess, and is also ill-suited for UTF-8. Probably
>>> better to use char16_t and char32_t, if you do need fixed-width wide
>>> character storage.
>>
>> wchar_t is a typical C vague definition where char16_t and char32_t
>> are much more clearly defined.
>>
>> But wchar_t got runtime support.
>
> Run-time support which is less than useful for most purposes,
> particularly given the definition and the ~portability issues.
I think I would miss wcs*, isw*, w versions of IO functions.
>> C (and for that matter also C++) IO functions does not not make
>> writing/reading UTF-8 easy.
>
> The C I/O functions do ~mostly fine.
>
> Semi-recent Clang, else-platform:
>
>
> $ cc x.c -o x
> $ ~/x
> hello 🗺
> $ cat x.c
> #include <stdio.h>
> #include <stdlib.h>
>
> int main(void)
> {
> printf("hello 🗺\n");
> exit(EXIT_SUCCESS);
> }
That is C IO processing bytes where the application
has put UTF-8 in.
What is needed is something where the application
passes unicode (wchar_t* or char16_t* or char32_t*)
to an IO function and it convert to a specified encoding
UTF-8 or otherwise.
>> Newer languages does much better.
>
> Of course. Objective C does far better here too, and that language is
> hardly new. As do Perl and Python, as were mentioned by others.
>
> On OpenVMS, BASIC is probably most obvious candidate for adding UTF-8
> and a more general ooverhaul.
>
> But there are oothers using BASIC that would never get oover that, and
> would oobject to OOBASIC.
>
> Any retrofit of UTF-8 and adding UTF-8 and/or OO support into the
> OpenVMS platform is a yet larger effort.
The most obvious languages for adding OO are Basic and Pascal
(other platforms has prove that it works - unlike Fortran
and Cobol where interest is minimal).
For UTF-8 support I would probably say Pascal, Basic and Cobol.
Not so relevant for Fortran. And C/C++ will have to wait for the
standard for a nice solution and various hacks are already
possible.
Arne
More information about the Info-vax
mailing list