[Info-vax] 8-bit characters
Jan-Erik Söderholm
jan-erik.soderholm at telia.com
Wed Nov 10 19:21:38 EST 2021
Den 2021-11-10 kl. 19:04, skrev Stephen Hoffman:
> On 2021-11-10 09:44:34 +0000, Phillip Helbig (undress to reply said:
>
>> Having to write some Icelanding words in a DECterm (as one does), I
>> notice that COMPOSE-T-H and COMPOSE-t-h create upper and lower case thorn
>> (Þ þ if those characters get through). If entered by cboth create the
>> character, unless it is at the beginning of a line, in which case one
>> sees <XDE> or <XFE> (one character, displayed as several). ASCII values
>> are 222 and 254. Refreshing the screen also causes the mnenonics to
>> appear. Also, they are not displayed via HELP FORTRAN CHAR DEC.
>>
>> Any deeper reason or just flaky instrumentation?
>>
>> I also notice that × (COMPOSE-x-x) works fine in a DECterm but not on a
>> real VT220 (where most or all other composed characters work). Again,
>> deeper meaning or just flaky?
>
> You're definitely not looking at ASCII, and AFAIK Þ and þ aren't in DEC
> MCS, which likely means you're looking at inconsistent handling of or
> inconsistent configuration of ISO 8859-1 among your apps and OS and
> hardware; I'd guess some here is MCS, and some 8859-1.
>
> You've asked variations of this question over the years too, usually
> involving trying to use EDT past ASCII or maybe past DEC MCS.
>
> https://groups.google.com/g/comp.os.vms/c/QAQAyRo9BPM/m/IrmCw1UJBQAJ
> https://groups.google.com/g/comp.os.vms/c/Yji2Tufvv7k/m/mhUy-zKXAAAJ
> etc.
>
> This is part of the (lack of) UTF-8 and Unicode support in OpenVMS and its
> tooling that I've grumbled. Not that adding UTF-8 and Unicode support is
> ever going to be a small overhaul.
>
Now, UTF8 is just a "row of bytes", so if you use (as an example) Putty
in its default setup using UTF8, you can type (or copy/paste) any UTF8
character into Putty and it will be stored using whatever editor you
are using. It is just a row of bytes, so there is no specific need for
any "UTF8 support" for doing just that.
Later on, of you send the same text to some UTF8 compatible display (like
another Putty session using the default UTF8 setup, or a web browser using
UTF8 encoding) the Islandic characters would be displayed just fine.
But if you are using some display tool that doesn't support UTF8, you
will get garbled text, of course. But that is not the fault of OpenVMS.
It is unclear if ISO/IEC 646 have/had support for Icelandic characters,
the Wiki page has an entry for "IS" in some tables but no real data.
https://en.wikipedia.org/wiki/ISO/IEC_646
Then of course, it is a totally other matter if you are talkning about
UTF8 support for symbols/variables in compilers or in file/directory
names, but that is a totally differnt area from just storing and
displaying some "data" that happens to include UTF8 sequences.
But that isn't in the scope of the question asked.
I would not expect tools like DECterm or VT220 (really?) to handle
UTF8 or anything else outside the DEC-MCS range of characters. If you
need that, simply use modern tool from the last 20 years or so.
More information about the Info-vax
mailing list