[Info-vax] Does OpenVMS Use Unicode?

Johnny Billquist bqt at softjar.se
Wed Jun 15 05:42:30 EDT 2016


On 2016-06-15 08:38, lawrencedo99 at gmail.com wrote:
> On Tuesday, June 14, 2016 at 12:37:54 AM UTC+12, Stephen Hoffman wrote:
>
>> String descriptors — a very primitive and limited form of an object —
>> lacks any sort of character encoding tag, and the file system similarly
>> lacks encoding-related metadata mechanisms.
>
> That’s not fatal. Anything formerly was defined to hold ASCII bytes can simply be redefined to be UTF-8. Lots of things on other platforms have done that.
>
> Just so long as the code for handling it is 8-bit clean. :)

Mostly true. Almost all managing of strings will work just as well if 
you suddenly decide that you use UTF-8, and all will work with no changes.

There are only a couple of cases when things break:
1) Figuring out string lengths. The old assumption that one byte is one 
character is no longer true.
2) String collating. The sorting order of strings suddenly become very 
complex, and you can not at all depend on just sorting based by byte 
values any more.
3) String comparisons. If you compare two strings, they might actually 
be considered equal even though the byte values are totally different. 
This is a property of Unicode, but as such, it gets reflected in the 
storage of the bytes even if encoded as UTF-8 (this is actually the 
issue with point 2 as well).

So, for things that don't care about the actual content of a string, the 
current string descriptors will hold a UTF-8 encoded string just as well 
as a current Latin-1 string. No changes.
For code that manipulate and examine strings, there are subtle problems.

	Johnny




More information about the Info-vax mailing list