[Info-vax] Character sets

Arne Vajhøj arne at vajhoej.dk
Tue Sep 6 19:32:46 EDT 2022


On 9/6/2022 4:31 PM, Stephen Hoffman wrote:
> On 2022-09-06 18:42:53 +0000, Arne Vajhj said:
> 
>> On 9/3/2022 3:30 PM, Stephen Hoffman wrote:
>>> Pedant notes: yes, I do know about wchar_t and friends in C and C++, 
>>> which is... a mess, and is also ill-suited for UTF-8.  Probably 
>>> better to use char16_t and char32_t, if you do need fixed-width wide 
>>> character storage.
>>
>> wchar_t is a typical C vague definition where char16_t and char32_t 
>> are much more clearly defined.
>>
>> But wchar_t got runtime support.
> 
> Run-time support which is less than useful for most purposes, 
> particularly given the definition and the ~portability issues.

I think I would miss wcs*, isw*, w versions of IO functions.

>> C (and for that matter also C++) IO functions does not not make 
>> writing/reading UTF-8 easy.
> 
> The C I/O functions do ~mostly fine.
> 
> Semi-recent Clang, else-platform:
> 
> 
> $ cc x.c -o x
> $ ~/x
> hello 🗺
> $ cat x.c
> #include <stdio.h>
> #include <stdlib.h>
> 
> int  main(void)
> {
>   printf("hello 🗺\n");
>   exit(EXIT_SUCCESS);
> }

That is C IO processing bytes where the application
has put UTF-8 in.

What is needed is something where the application
passes unicode (wchar_t* or char16_t* or char32_t*)
to an IO function and it convert to a specified encoding
UTF-8 or otherwise.

>> Newer languages does much better.
> 
> Of course. Objective C does far better here too, and that language is 
> hardly new. As do Perl and Python, as were mentioned by others.
> 
> On OpenVMS, BASIC is probably most obvious candidate for adding UTF-8 
> and a more general ooverhaul.
> 
> But there are oothers using BASIC that would never get oover that, and 
> would oobject to OOBASIC.
> 
> Any retrofit of UTF-8 and adding UTF-8 and/or OO support into the 
> OpenVMS platform is a yet larger effort.

The most obvious languages for adding OO are Basic and Pascal
(other platforms has prove that it works - unlike Fortran
and Cobol where interest is minimal).

For UTF-8 support I would probably say Pascal, Basic and Cobol.
Not so relevant for Fortran. And C/C++ will have to wait for the
standard for a nice solution and various hacks are already
possible.

Arne





More information about the Info-vax mailing list