[Info-vax] Character sets
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Tue Sep 6 16:31:16 EDT 2022
On 2022-09-06 18:42:53 +0000, Arne Vajhj said:
> On 9/3/2022 3:30 PM, Stephen Hoffman wrote:
>> Pedant notes: yes, I do know about wchar_t and friends in C and C++,
>> which is... a mess, and is also ill-suited for UTF-8. Probably better
>> to use char16_t and char32_t, if you do need fixed-width wide character
>> storage.
>
> wchar_t is a typical C vague definition where char16_t and char32_t are
> much more clearly defined.
>
> But wchar_t got runtime support.
Run-time support which is less than useful for most purposes,
particularly given the definition and the ~portability issues.
> C (and for that matter also C++) IO functions does not not make
> writing/reading UTF-8 easy.
The C I/O functions do ~mostly fine.
Semi-recent Clang, else-platform:
$ cc x.c -o x
$ ~/x
hello 🗺
$ cat x.c
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
printf("hello 🗺\n");
exit(EXIT_SUCCESS);
}
$
The C character functions are decidedly lacking, but then C character
functions are also lacking for existing ISO Latin 1 / DEC MCS strings,
too.
C++ does better, here.
Extracted from my previous reply and re-posted here:
>> Little (nothing?) past the ODS-5 UTF-8 filename work exists with
>> OpenVMS, and—as with most of the retrofit-compatible-hackery—that's
>> less than easy for apps to use.
>> ** You'll probably be using or porting recent versions of ICU,
>> libunistring, or ilk, **
>> and the OpenVMS 32- and 64-bit string descriptors are unfortunately
>> also less than useful here around language and encoding. (This is where
>> the object abstraction shines, too. It's what descriptors and itemlists
>> evolved into, on other platforms.)
OpenVMS does have an older version of ICU support.
Wondering why the obscurely-named I18N kit is still optional is akin to
wondering why IP networking is still optional. But I digress.
> Newer languages does much better.
Of course. Objective C does far better here too, and that language is
hardly new. As do Perl and Python, as were mentioned by others.
On OpenVMS, BASIC is probably most obvious candidate for adding UTF-8
and a more general ooverhaul.
But there are oothers using BASIC that would never get oover that, and
would oobject to OOBASIC.
Any retrofit of UTF-8 and adding UTF-8 and/or OO support into the
OpenVMS platform is a yet larger effort.
And for now, work with no obvious nor direct payback for VSI.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list