[Info-vax] Does OpenVMS Use Unicode?

Wed Jun 15 13:51:47 EDT 2016

Dne středa 15. června 2016 13:02:17 UTC+2 Phillip Helbig (undress to reply) napsal(a):
> In article <njr9r9$ldl$1 at news.albasani.net>, Jan-Erik Soderholm
> <...> writes: 

> EDT will properly change the case of such letters, but searching for A
> or any "variant" of A such as Á, À, Â, Ã, Ä, Å will match all (and A).

Use EDT command: SET SEARCH EXACT
It will set the behavior you expect.
People using languages which require other charset encoding are not so lucky. 
There are several encoding issues with editors:
EDT uses DEC-MCS charset only. It is possible to adapt it to any fixed length 8bit charset. Almost all charset related functions are table driven, but there are some hard coded residues like "XOR 32" which need to be fixed.
TPU supports ISO8859-1, DEC-MCS and ASCII. It is also table driven, also with some hard coded residuals. 

> > But a simple SORT of a textfile gets it wrong. It sorts Ä ->Å ->Ö
> > while the correct order is Å ->Ä ->Ö .
> 
> Presumably it sorts them in the order of the corresponding bit values.
> 
> Your "correct" order is correct in Swedish.  In Norwegian (where there 
> is no Ö but there is Ø instead and no Ä but Æ instead) the correct order 
> is Æ ->Ø ->Å.
> 

SORT /COLLATING_SEQUENCE=xx
where xx is a module in NCS$LIBRARY.NLB - it can be compiled from source using NCS utility. There exist collating sequences for both: Norwegian and Swedish.
NLS can use sequences in any encoding - just write and compile a definition module and you can use it in sort/merge and RMS indexes.
You can use UTF-8 too, but there are some limits in NLS structure or compiler. If there are too many lines in the NLS definition, the NLS compiler responds with an error. Ordering of all UTF-8 latin characters is above this limit. 

Jiri Kaspar
Czech Technical University