[Info-vax] ODS-5 specifications, was: Re: Does OpenVMS Use Unicode?

Wed Jun 15 01:35:13 EDT 2016

Stephen Hoffman wrote:
> On 2016-06-14 22:17:42 +0000, David Froble said:
> 
>> Just about every time I see something about character coding beyond 
>> Ascii, I see references to the C RTL.  So which is it, VMS supports 
>> this stuff, or the C RTL supports this stuff?
> 
> C has more support for alternate character sets and encodings than does 
> OpenVMS itself.
> 
> OpenVMS itself has little support for encodings past ISO Latin 1 (as 
> that's very nearly DEC MCS, which is supported).    AFAIK, the national 
> replacement character set NRCS support was deprecated long ago, and that 
> never got around to Unicode or UTF-8 or such.   There's the UCS-16 and 
> VTF-7 support that's been discussed recently, and that's variously 
> sort-of documented and also sort-of undocumented.
> 
>> In my opinion, which nobody has to respect, only if the support was 
>> part of VMS, not some language, could it be claimed that VMS supported 
>> this stuff, or just about anything else.  If I cannot use it as a 
>> general VMS capability, then it's just a C application.
>>
>> :-)
> 
> We can rewind back to the drag-DCL-forward discussion from some months 
> back, or to the more recent discussions of the escaping syntax in ODS-5 
> (or the escaping needed to get from 32- to 64-bit addressing or any 
> number of other areas, for that matter, but I digress), but these 
> changes can mean that some old apps might need to be reworked or rebuilt.
> 
> Adding UTF-8 support would likely require some changes to existing BASIC 
> applications, as details such as determining the string length runs 
> afoul of one-character-one-byte assumptions.  String sorting can also 
> arise here.    Upcasing and downcasing has been discussed recently, 
> too.   That'll all need to be reviewed, as the length in bytes are equal 
> or larger than the length of the string in characters, and the 
> language-specific sorting that can be necessary.    (NRCS dealt with 
> these sorts of details for languages within the reach of DEC MCS, 
> though.)   Adding UTF-8 means having access to characters necessary for 
> most any language, and having to become rather more familiar with the 
> post-ASCII post-MCS world.
> 
> 

I would think that if Basic, and others, use valid VMS data types, then the 
support for those data types just might mean the language really doesn't have to 
care about the structure of the data type.

What's maybe missing is valid VMS data types that support extended character 
sets.  I guess I just don't understand why the extended character stuff was put 
in some language RTL instead of VMS data types.