[Info-vax] 8-bit characters
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Wed Nov 10 20:28:26 EST 2021
On 2021-11-11 00:21:38 +0000, Jan-Erik Søderholm said:
> Now, UTF8 is just a "row of bytes", so if you use (as an example) Putty
> in its default setup using UTF8, you can type (or copy/paste) any UTF8
> character into Putty and it will be stored using whatever editor you
> are using. It is just a row of bytes, so there is no specific need for
> any "UTF8 support" for doing just that.
You're quite possibly headed for a few surprises within OpenVMS apps,
even if cutting-and-pasting wads of bytes around. Not the least of
which involves counting characters/code points/clusters (one byte is no
longer one character, so is the app looking for the buffer size or the
character/code point/cluster length, and what to do with the zero-width
stuff?), the fun that is directionality (there's a recent CVE related
to this), and identifying the string encoding and the string language
for each string, normalization, and the inherent language-sensitivity
of strings for purposes such as sorting. In aggregate, some baked-in
app and OpenVMS API assumptions—and developers' own assumptions—about
strings can and will break. Sure, UTF-8 is a "row of bytes", with some
caveats. Sort of. Mostly.
There are a few corners of OpenVMS that have some UTF-8 support, one is
the XQP. Another is C including the I18N bits. Java. I'd expect that
pervasive support is at least a decade away.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list