[Info-vax] ODS-5 specifications, was: Re: Does OpenVMS Use Unicode?

Stephen Hoffman seaohveh at hoffmanlabs.invalid
Tue Jun 14 21:05:28 EDT 2016


On 2016-06-14 22:17:42 +0000, David Froble said:

> Just about every time I see something about character coding beyond 
> Ascii, I see references to the C RTL.  So which is it, VMS supports 
> this stuff, or the C RTL supports this stuff?

C has more support for alternate character sets and encodings than does 
OpenVMS itself.

OpenVMS itself has little support for encodings past ISO Latin 1 (as 
that's very nearly DEC MCS, which is supported).    AFAIK, the national 
replacement character set NRCS support was deprecated long ago, and 
that never got around to Unicode or UTF-8 or such.   There's the UCS-16 
and VTF-7 support that's been discussed recently, and that's variously 
sort-of documented and also sort-of undocumented.

> In my opinion, which nobody has to respect, only if the support was 
> part of VMS, not some language, could it be claimed that VMS supported 
> this stuff, or just about anything else.  If I cannot use it as a 
> general VMS capability, then it's just a C application.
> 
> :-)

We can rewind back to the drag-DCL-forward discussion from some months 
back, or to the more recent discussions of the escaping syntax in ODS-5 
(or the escaping needed to get from 32- to 64-bit addressing or any 
number of other areas, for that matter, but I digress), but these 
changes can mean that some old apps might need to be reworked or 
rebuilt.

Adding UTF-8 support would likely require some changes to existing 
BASIC applications, as details such as determining the string length 
runs afoul of one-character-one-byte assumptions.  String sorting can 
also arise here.    Upcasing and downcasing has been discussed 
recently, too.   That'll all need to be reviewed, as the length in 
bytes are equal or larger than the length of the string in characters, 
and the language-specific sorting that can be necessary.    (NRCS dealt 
with these sorts of details for languages within the reach of DEC MCS, 
though.)   Adding UTF-8 means having access to characters necessary for 
most any language, and having to become rather more familiar with the 
post-ASCII post-MCS world.


-- 
Pure Personal Opinion | HoffmanLabs LLC 




More information about the Info-vax mailing list