[Info-vax] Calling $CREPRC in COBOL

Mon Jun 27 22:46:22 EDT 2022

On 6/27/22 9:00 PM, Arne Vajhøj wrote:
> On 6/27/2022 6:51 PM, Stephen Hoffman wrote:
>> There is, however, much more to the computing world than what can be 
>> represented in DEC MCS and ISO Latin 1.
>>
>> OpenVMS RMS has support for UTF-8, not that much else does.
>>
>> Reference those RMS files through encoding hackery is certainly 
>> possible, of course.
>>
>> Dealing with UTF-8 is something an increasing number of apps have to 
>> deal with.
>>
>> Websites and web servers have been UTF-8 for a number of years.
>>
>> On OpenVMS, most apps ignore UTF-8, and require / assume / force 
>> arriving data to ASCII.
>>
>> Unfortunately, that can be names and addresses and other data that 
>> doesn't map to ASCII.
>>
>> Dates too are "fun", but I'll leave that "fun" for another, um, day.
>>
>> Sure, we can all continue to use ASCII and DEC MCS, and can ignore the 
>> whole character encoding issue.
>>
>> And can map "unsupported" string encodings using UUID-generated names 
>> (aliases), as my earlier joke had alluded.
>>
>> Which is about the best option on OpenVMS, if porting is not a 
>> possibility.
>>
>> If you're US based and not working outside of Romance languages, this 
>> is less of an issue.
>>
>> Though I'd consider testing customer-facing data interfaces with UTF-8.
>>
>> This whole thing becomes a non-issue* in environments where UTF-8 is 
>> native.
>>
>> *Mostly. UTF-8 still has some surprises waiting, including the byte 
>> order mark, language-specific sort orders, and non-breaking spaces.
>>
>> Getting to fully native UTF-8 support in the OpenVMS operating system, 
>> tools, and platforms, is unlikely on any reasonable VSI OpenVMS timeline.
> 
> There are two models for Unicode support.
> 
> A) UTF-8 internal and UTF-8 external
> 
> That one is not so difficult to implement.
> 
> Most existing libraries work.
> 
> For anything ASCII everything works exactly as before.
> 
> One need some string function to operate on character index
> instead of byte index.
> 
> But not so difficult.
> 
> Problem is that a lot of string functionality becomes expensive
> because all use of character indexes become iterations.
> 
> B) UTF-16 internal and UTF-8 external
> 
> That one requires a lot of work.
> 
> Library support.
> 
> Application changes.
> 
> But it is efficient.
> 
> Which is why C/C++, Java and .NET all chose that path.

I think they made those choices when UCS-2 was current and everyone
thought a wider fixed-width encoding would be enough.  UTF-16 needs all
of the same varying-width handling that UTF-8 does but uses twice as
much memory for the most common characters.