[Info-vax] Does OpenVMS Use Unicode?
Johnny Billquist
bqt at softjar.se
Wed Jun 15 05:42:30 EDT 2016
On 2016-06-15 08:38, lawrencedo99 at gmail.com wrote:
> On Tuesday, June 14, 2016 at 12:37:54 AM UTC+12, Stephen Hoffman wrote:
>
>> String descriptors — a very primitive and limited form of an object —
>> lacks any sort of character encoding tag, and the file system similarly
>> lacks encoding-related metadata mechanisms.
>
> That’s not fatal. Anything formerly was defined to hold ASCII bytes can simply be redefined to be UTF-8. Lots of things on other platforms have done that.
>
> Just so long as the code for handling it is 8-bit clean. :)
Mostly true. Almost all managing of strings will work just as well if
you suddenly decide that you use UTF-8, and all will work with no changes.
There are only a couple of cases when things break:
1) Figuring out string lengths. The old assumption that one byte is one
character is no longer true.
2) String collating. The sorting order of strings suddenly become very
complex, and you can not at all depend on just sorting based by byte
values any more.
3) String comparisons. If you compare two strings, they might actually
be considered equal even though the byte values are totally different.
This is a property of Unicode, but as such, it gets reflected in the
storage of the bytes even if encoded as UTF-8 (this is actually the
issue with point 2 as well).
So, for things that don't care about the actual content of a string, the
current string descriptors will hold a UTF-8 encoded string just as well
as a current Latin-1 string. No changes.
For code that manipulate and examine strings, there are subtle problems.
Johnny
More information about the Info-vax
mailing list