[Info-vax] Calling $CREPRC in COBOL

Sun Jul 3 20:28:31 EDT 2022

On 7/3/2022 4:15 PM, seasoned_geek wrote:
> On Tuesday, June 28, 2022 at 7:43:54 AM UTC-5, Arne Vajhøj wrote:
>> On 6/27/2022 10:46 PM, Craig A. Berry wrote:
>>>> B) UTF-16 internal and UTF-8 external
>>>>
>>>> That one requires a lot of work.
>>>>
>>>> Library support.
>>>>
>>>> Application changes.
>>>>
>>>> But it is efficient.
>>>>
>>>> Which is why C/C++, Java and .NET all chose that path.
>>>
>>> I think they made those choices when UCS-2 was current and everyone
>>> thought a wider fixed-width encoding would be enough.
>> Yes.
>>>    UTF-16 needs all
>>> of the same varying-width handling that UTF-8
>> In theory yes.
>>
>> In practice it is common for applications only to support BMP.
>>> does but uses twice as
>>> much memory for the most common characters.
>> Most don't care.
> 
> Actually most do care about the memory consumption. Especially when
> it cascades out to disk that is too small. They also care about the
> overhead of CHAR processing.
Developers at least in the static typed compiled subset of languages
prefer languages that work that way.

(script languages for whatever reason often use UTF-8 internally)

> C++ will soon follow the path CopperSpice took. They created QChar32
> because we are now out to 32-bit Unicode. UTF-8 and UTF-16 have their
> own hacks for multi-unit characters and that adds processing
> overhead. The 32-bit character approach allows the database/indexed
> file/real data storage that isn't JSON or XML to cleanly do record
> compression for storage and decompression for retrieval without
> making the processor drag the 8-bottom plow of multi-unit character
> processing.
Maybe. I don't follow C++ standardization process.

But right now UTF-16 in memory and UTF-8 on disk is what
is used (in most of before mentioned segment).

Arne