[Info-vax] Calling $CREPRC in COBOL
Arne Vajhøj
arne at vajhoej.dk
Mon Jun 27 22:00:29 EDT 2022
On 6/27/2022 6:51 PM, Stephen Hoffman wrote:
> There is, however, much more to the computing world than what can be
> represented in DEC MCS and ISO Latin 1.
>
> OpenVMS RMS has support for UTF-8, not that much else does.
>
> Reference those RMS files through encoding hackery is certainly
> possible, of course.
>
> Dealing with UTF-8 is something an increasing number of apps have to
> deal with.
>
> Websites and web servers have been UTF-8 for a number of years.
>
> On OpenVMS, most apps ignore UTF-8, and require / assume / force
> arriving data to ASCII.
>
> Unfortunately, that can be names and addresses and other data that
> doesn't map to ASCII.
>
> Dates too are "fun", but I'll leave that "fun" for another, um, day.
>
> Sure, we can all continue to use ASCII and DEC MCS, and can ignore the
> whole character encoding issue.
>
> And can map "unsupported" string encodings using UUID-generated names
> (aliases), as my earlier joke had alluded.
>
> Which is about the best option on OpenVMS, if porting is not a possibility.
>
> If you're US based and not working outside of Romance languages, this is
> less of an issue.
>
> Though I'd consider testing customer-facing data interfaces with UTF-8.
>
> This whole thing becomes a non-issue* in environments where UTF-8 is
> native.
>
> *Mostly. UTF-8 still has some surprises waiting, including the byte
> order mark, language-specific sort orders, and non-breaking spaces.
>
> Getting to fully native UTF-8 support in the OpenVMS operating system,
> tools, and platforms, is unlikely on any reasonable VSI OpenVMS timeline.
There are two models for Unicode support.
A) UTF-8 internal and UTF-8 external
That one is not so difficult to implement.
Most existing libraries work.
For anything ASCII everything works exactly as before.
One need some string function to operate on character index
instead of byte index.
But not so difficult.
Problem is that a lot of string functionality becomes expensive
because all use of character indexes become iterations.
B) UTF-16 internal and UTF-8 external
That one requires a lot of work.
Library support.
Application changes.
But it is efficient.
Which is why C/C++, Java and .NET all chose that path.
Arne
More information about the Info-vax
mailing list