[Info-vax] 8-bit characters

Stephen Hoffman seaohveh at hoffmanlabs.invalid
Wed Nov 10 20:28:26 EST 2021


On 2021-11-11 00:21:38 +0000, Jan-Erik Søderholm said:

> Now, UTF8 is just a "row of bytes", so if you use (as an example) Putty 
> in its default setup using UTF8, you can type (or copy/paste) any UTF8 
> character into Putty and it will be stored using whatever editor you 
> are using. It is just a row of bytes, so there is no specific need for
> any "UTF8 support" for doing just that.

You're quite possibly headed for a few surprises within OpenVMS apps, 
even if cutting-and-pasting wads of bytes around. Not the least of 
which involves counting characters/code points/clusters (one byte is no 
longer one character, so is the app looking for the buffer size or the 
character/code point/cluster length, and what to do with the zero-width 
stuff?), the fun that is directionality (there's a recent CVE related 
to this), and identifying the string encoding and the string language 
for each string, normalization, and the inherent language-sensitivity 
of strings for purposes such as sorting. In aggregate, some baked-in 
app and OpenVMS API assumptions—and developers' own assumptions—about 
strings can and will break. Sure, UTF-8 is a "row of bytes", with some 
caveats. Sort of. Mostly.

There are a few corners of OpenVMS that have some UTF-8 support, one is 
the XQP. Another is C including the I18N bits. Java. I'd expect that 
pervasive support is at least a decade away.




-- 
Pure Personal Opinion | HoffmanLabs LLC 




More information about the Info-vax mailing list