[Info-vax] Calling $CREPRC in COBOL

Sun Jul 3 16:15:58 EDT 2022

On Tuesday, June 28, 2022 at 7:43:54 AM UTC-5, Arne Vajhøj wrote:
> On 6/27/2022 10:46 PM, Craig A. Berry wrote: 
> >> B) UTF-16 internal and UTF-8 external 
> >> 
> >> That one requires a lot of work. 
> >> 
> >> Library support. 
> >> 
> >> Application changes. 
> >> 
> >> But it is efficient. 
> >> 
> >> Which is why C/C++, Java and .NET all chose that path. 
> > 
> > I think they made those choices when UCS-2 was current and everyone 
> > thought a wider fixed-width encoding would be enough.
> Yes.
> >   UTF-16 needs all 
> > of the same varying-width handling that UTF-8
> In theory yes. 
> 
> In practice it is common for applications only to support BMP.
> > does but uses twice as 
> > much memory for the most common characters.
> Most don't care. 

Actually most do care about the memory consumption. Especially when it cascades out to disk that is too small. They also care about the overhead of CHAR processing.

C++ will soon follow the path CopperSpice took. They created QChar32 because we are now out to 32-bit Unicode. UTF-8 and UTF-16 have their own hacks for multi-unit characters and that adds processing overhead. The 32-bit character approach allows the database/indexed file/real data storage that isn't JSON or XML to cleanly do record compression for storage and decompression for retrieval without making the processor drag the 8-bottom plow of multi-unit character processing.