[Info-vax] Does OpenVMS Use Unicode?

hb end.of at inter.net
Mon Jun 13 04:57:27 EDT 2016


On 06/13/2016 05:03 AM, John E. Malmberg wrote:
> VMS ODS-5 stores Unicode in UCS-2 internally and uses its own VTF-7
> encoding to access it as ASCII characters.

In ODS-5 there is support for filenames which contain ODS-2, ISO Latin-1
or Unicode plane 0 characters. Internally to ODS-5 filenames are flagged
with a "Name type" which can be either ODS-2, ISL-1 or UCS-2.

ISL-1 is used for filenames with
1) non-printable ASCII characters: 0x00 - 0x1f, 0x7f
2) ASCII characters, which are not allowed in ODS-2 - except lowercase
characters: a - z
3) non-ASCII 8-bit characters: 0x80 - 0xff.

The DIRECTORY command prints filename characters for
1) as ^xx (hexadecimal digits)
2) as ASCII character except those characters which have a special
meaning in the VMS filespec or DCL (for example '[' or '/') and
therefore are escaped with '^'
3) as ISO Latin-1 (not DEC MCS) for all others - except ^ff, which is a
printable ISO Latin-1 character, but is escaped with '^' (which looks
like a minor defectlet to me).

> This is used in the Pathworks/Advanced Server product and some other
> products used this encoding.
> 
> Until VMS 8.4, the VMS C library did not support translating ODS-2 UCS-2
> format filenames to or from UTF-8 Unix format names.
> 
> That support is turned on via a decc$ feature.  Decc$ features can be
> set via logical names or by a C callable API in programmers.

Which, according to the docs, would be enabling
DECC$FILENAME_ENCODING_UTF8, DECC$EFS_CHARSET and whatever is necessary
to ensure that filenames are in "UNIX syntax".

> Until that support was added, the VMS C library could only handle UCS-2
> through VTF-7 encoded filenames.

Which are encoded as "^Uxxxx", where xxxx is the code point in the Unicode.

> Now even though the CRTL did not support UCS-2, because ODS-5 can store
> any binary character, the CRTL did inadvertently (and undocumented)
> start supporting UTF-8 when ODS-5 support was added.

I may be wrong, but to me it looks like internally to ODS-5 these
filenames were of type ISL-1, with UTF-8 support in the CRTL these files
internally became type UCS-2.

> This was not discovered until people tried moving from the discontinued
> Pathworks product to CIFS, as VTF-7 encoded UCS-2 was not visible to
> CIFS and I do not know how UTF-8 encoded filenames showed up to
> Pathworks if at all.
> 
> How VSI is going to handle these issues, I do not know.  It probably
> will depend on customer feedback.




More information about the Info-vax mailing list