[Info-vax] ODS-5 specifications, was: Re: Does OpenVMS Use Unicode?

Wed Jun 15 08:38:31 EDT 2016

On 2016-06-15 05:35:13 +0000, David Froble said:

> I would think that if Basic, and others, use valid VMS data types, then 
> the support for those data types just might mean the language really 
> doesn't have to care about the structure of the data type.
> 
> What's maybe missing is valid VMS data types that support extended 
> character sets.  I guess I just don't understand why the extended 
> character stuff was put in some language RTL instead of VMS data types.

It'll involve the language RTLs irrespective of descriptor support.

Some UTF-8 characters are multiple bytes.

With some encoding like UTF-8, there are two different lengths 
associated with the character strings.   The displayed or rendered 
length of the string (for display and printing purposes), and the 
length required to store the string in memory or in a record in an RMS 
file or database field.   That means that the string length functions 
that you're presently accustomed to calling will fail you for some of 
what you're potentially using them for.   That's independent of the 
encoding metadata.

Sure — there could be descriptors — but all the fields in the existing 
32-bit descriptors are used so you'd have to extend the text type to 
also include the encoding or use a different (and longer) 32-bit 
descriptor.  BASIC doesn't do 64-bit, though there'd be similar 
shuffles there, too.   But again, the rendered length is equal or less 
than the stored length in UTF-8 and some other encodings, and you'd 
prefer to have a way to store the encoding.

The "proper" way to do this in BASIC would be to switch everything over 
to UTF-8, and to provide a different set of calls for counting 
characters and counting bytes.  Downside there is that you'll need to 
switch over text editors,  the linker will need changes if you allow 
UTF-8 symbols, the display devices and the terminal driver and SET and 
SHOW TERMINAL would need updates for UTF-8 support, and there'll be 
other giblets effected, too.  This also easily leaks over into the DCL 
interpreter and other pieces of OpenVMS, too.

Moving OpenVMS forward is no small project, and it's not isolated to 
string descriptors.  It'll also break existing code, which gets back to 
my usual comments on the double-edged nature of compatibility, when you 
finally get around to actually needing whatever was implemented in a 
"compatible" fashion, or when you just can't make changes without 
breaking existing code.  Adding UTF-8 is either going to break 
applications, or it's going to be an incredibly convoluted design.

-- 
Pure Personal Opinion | HoffmanLabs LLC