[Info-vax] Does OpenVMS Use Unicode?

Wed Jun 15 16:16:58 EDT 2016

Den 2016-06-15 kl. 19:51, skrev Jiri Kaspar:
> Dne středa 15. června 2016 13:02:17 UTC+2 Phillip Helbig (undress to
> reply) napsal(a):
>> In article <njr9r9$ldl$1 at news.albasani.net>, Jan-Erik Soderholm <...>
>> writes:
>
>> EDT will properly change the case of such letters, but searching for
>> A or any "variant" of A such as Á, À, Â, Ã, Ä, Å will match all (and
>> A).
>
> Use EDT command: SET SEARCH EXACT It will set the behavior you expect.
> People using languages which require other charset encoding are not so
> lucky. There are several encoding issues with editors: EDT uses DEC-MCS
> charset only. It is possible to adapt it to any fixed length 8bit
> charset. Almost all charset related functions are table driven, but
> there are some hard coded residues like "XOR 32" which need to be
> fixed. TPU supports ISO8859-1, DEC-MCS and ASCII. It is also table
> driven, also with some hard coded residuals.
>
>>> But a simple SORT of a textfile gets it wrong. It sorts Ä ->Å ->Ö
>>> while the correct order is Å ->Ä ->Ö .
>>
>> Presumably it sorts them in the order of the corresponding bit
>> values.
>>
>> Your "correct" order is correct in Swedish.  In Norwegian (where there
>>  is no Ö but there is Ø instead and no Ä but Æ instead) the correct
>> order is Æ ->Ø ->Å.
>>
>
> SORT /COLLATING_SEQUENCE=xx where xx is a module in NCS$LIBRARY.NLB - it
> can be compiled from source using NCS utility. There exist collating
> sequences for both: Norwegian and Swedish. NLS can use sequences in any
> encoding - just write and compile a definition module and you can use it
> in sort/merge and RMS indexes. You can use UTF-8 too, but there are some
> limits in NLS structure or compiler. If there are too many lines in the
> NLS definition, the NLS compiler responds with an error. Ordering of all
> UTF-8 latin characters is above this limit.
>
> Jiri Kaspar Czech Technical University
>

Good pointer!

I had no member called "Swedish", only one called "Swedish_NRC_to_Multi"
and that didn't work.

$ sort/coll=Swedish_NRC_to_Multi MYTEXT.TXT sys$output
%SORT-F-BADKEY, 'SWEDISH_NRC_TO_MULTI' is an invalid keyword
-NCS-E-NOT_CS, name or id is not a CS
$

"Finnish" did work and sorted åäö "correctly".

Could have something to do with the fact that Finland was "East Sweden"
until 1809... :-)

Jan-Erik.