[Info-vax] Calling $CREPRC in COBOL

Tue Jun 28 08:43:50 EDT 2022

On 6/27/2022 10:46 PM, Craig A. Berry wrote:
> On 6/27/22 9:00 PM, Arne Vajhøj wrote:
>> There are two models for Unicode support.
>>
>> A) UTF-8 internal and UTF-8 external
>>
>> That one is not so difficult to implement.
>>
>> Most existing libraries work.
>>
>> For anything ASCII everything works exactly as before.
>>
>> One need some string function to operate on character index
>> instead of byte index.
>>
>> But not so difficult.
>>
>> Problem is that a lot of string functionality becomes expensive
>> because all use of character indexes become iterations.
>>
>> B) UTF-16 internal and UTF-8 external
>>
>> That one requires a lot of work.
>>
>> Library support.
>>
>> Application changes.
>>
>> But it is efficient.
>>
>> Which is why C/C++, Java and .NET all chose that path.
> 
> I think they made those choices when UCS-2 was current and everyone
> thought a wider fixed-width encoding would be enough.

Yes.

>                                                       UTF-16 needs all
> of the same varying-width handling that UTF-8

In theory yes.

In practice it is common for applications only to support BMP.

>                                                does but uses twice as
> much memory for the most common characters.

Most don't care.

Arne