[Info-vax] Character sets

Stephen Hoffman seaohveh at hoffmanlabs.invalid
Tue Sep 6 16:31:16 EDT 2022


On 2022-09-06 18:42:53 +0000, Arne Vajhj said:

> On 9/3/2022 3:30 PM, Stephen Hoffman wrote:
>> Pedant notes: yes, I do know about wchar_t and friends in C and C++, 
>> which is... a mess, and is also ill-suited for UTF-8.  Probably better 
>> to use char16_t and char32_t, if you do need fixed-width wide character 
>> storage.
> 
> wchar_t is a typical C vague definition where char16_t and char32_t are 
> much more clearly defined.
> 
> But wchar_t got runtime support.

Run-time support which is less than useful for most purposes, 
particularly given the definition and the ~portability issues.

> C (and for that matter also C++) IO functions does not not make 
> writing/reading UTF-8 easy.

The C I/O functions do ~mostly fine.

Semi-recent Clang, else-platform:


$ cc x.c -o x
$ ~/x
hello 🗺
$ cat x.c
#include <stdio.h>
#include <stdlib.h>

int  main(void)
{
  printf("hello 🗺\n");
  exit(EXIT_SUCCESS);
}
$


The C character functions are decidedly lacking, but then C character 
functions are also lacking for existing ISO Latin 1 / DEC MCS strings, 
too.

C++ does better, here.

Extracted from my previous reply and re-posted here:

>> Little (nothing?) past the ODS-5 UTF-8 filename work exists with 
>> OpenVMS, and—as with most of the retrofit-compatible-hackery—that's 
>> less than easy for apps to use.
>> ** You'll probably be using or porting recent versions of ICU, 
>> libunistring, or ilk, **
>> and the OpenVMS 32- and 64-bit string descriptors are unfortunately 
>> also less than useful here around language and encoding. (This is where 
>> the object abstraction shines, too. It's what descriptors and itemlists 
>> evolved into, on other platforms.)

OpenVMS does have an older version of ICU support.

Wondering why the obscurely-named I18N kit is still optional is akin to 
wondering why IP networking is still optional. But I digress.

> Newer languages does much better.

Of course. Objective C does far better here too, and that language is 
hardly new. As do Perl and Python, as were mentioned by others.

On OpenVMS, BASIC is probably most obvious candidate for adding UTF-8 
and a more general ooverhaul.

But there are oothers using BASIC that would never get oover that, and 
would oobject to OOBASIC.

Any retrofit of UTF-8 and adding UTF-8 and/or OO support into the 
OpenVMS platform is a yet larger effort.

And for now, work with no obvious nor direct payback for VSI.

-- 
Pure Personal Opinion | HoffmanLabs LLC 




More information about the Info-vax mailing list