[Info-vax] ODS-5 specifications, was: Re: Does OpenVMS Use Unicode?
Stephen Hoffman
seaohveh at hoffmanlabs.invalid
Wed Jun 15 08:38:31 EDT 2016
On 2016-06-15 05:35:13 +0000, David Froble said:
> I would think that if Basic, and others, use valid VMS data types, then
> the support for those data types just might mean the language really
> doesn't have to care about the structure of the data type.
>
> What's maybe missing is valid VMS data types that support extended
> character sets. I guess I just don't understand why the extended
> character stuff was put in some language RTL instead of VMS data types.
It'll involve the language RTLs irrespective of descriptor support.
Some UTF-8 characters are multiple bytes.
With some encoding like UTF-8, there are two different lengths
associated with the character strings. The displayed or rendered
length of the string (for display and printing purposes), and the
length required to store the string in memory or in a record in an RMS
file or database field. That means that the string length functions
that you're presently accustomed to calling will fail you for some of
what you're potentially using them for. That's independent of the
encoding metadata.
Sure — there could be descriptors — but all the fields in the existing
32-bit descriptors are used so you'd have to extend the text type to
also include the encoding or use a different (and longer) 32-bit
descriptor. BASIC doesn't do 64-bit, though there'd be similar
shuffles there, too. But again, the rendered length is equal or less
than the stored length in UTF-8 and some other encodings, and you'd
prefer to have a way to store the encoding.
The "proper" way to do this in BASIC would be to switch everything over
to UTF-8, and to provide a different set of calls for counting
characters and counting bytes. Downside there is that you'll need to
switch over text editors, the linker will need changes if you allow
UTF-8 symbols, the display devices and the terminal driver and SET and
SHOW TERMINAL would need updates for UTF-8 support, and there'll be
other giblets effected, too. This also easily leaks over into the DCL
interpreter and other pieces of OpenVMS, too.
Moving OpenVMS forward is no small project, and it's not isolated to
string descriptors. It'll also break existing code, which gets back to
my usual comments on the double-edged nature of compatibility, when you
finally get around to actually needing whatever was implemented in a
"compatible" fashion, or when you just can't make changes without
breaking existing code. Adding UTF-8 is either going to break
applications, or it's going to be an incredibly convoluted design.
--
Pure Personal Opinion | HoffmanLabs LLC
More information about the Info-vax
mailing list