[Info-vax] Micro Focus to be acquired by Open Text

Fri Sep 2 10:37:24 EDT 2022

On 2022-09-02 15:42, Arne Vajhøj wrote:
> On 9/2/2022 9:05 AM, Simon Clubley wrote:
>> On 2022-09-01, Jan-Erik Söderholm <jan-erik.soderholm at telia.com> wrote:
>>> Just read up a little.
>>>
>>> https://en.wikipedia.org/wiki/National_Replacement_Character_Set
> 
>> Professionally, I've grown up with 8-bit character sets and then, later,
>> UTF-8, so seeing this earlier standard looks really weird and alien to 
>> me.
> 
>> The idea that the same 7-bit character position can mean different
>> things in adjacent countries (such as Norway and Sweden) is indeed a
>> very alien idea to me and that would mean a company operating in both
>> countries would have some serious data interchange issues.
> 
> ISO-8859 has the same basic issue. Multiple meanings of same
> code - not per country like for ISO-646 but per region.
> 
> One country could be using ISO-8859-1 (western europe) and
> the neighbor country could be using ISO-8859-2 (eastern europe).

Very good point. There is nothing old/weird here. ISO-646 is no 
different than ISO-8859 really. Same value can mean different things, 
depending on which character set we're talking about.

And as someone mentioned UTF-8, they really need to understand that 
UTF-8 is not a character encoding at all, but an encoding of large 
values in 8-bit quantities. What they most likely mean is Unicode, which 
actually is identical with ISO-8859-1 for the first 256 code points.

It's just that if you encode Unicode using UTF-8, then a character like 
Ä becomes two bytes, yes. But the value is actually identical to the one 
in ISO-8859-1.

And as far as data interchange issues goes, yes. It can be sortof a 
problem if a text have a character that does not exist in another 
character set. And there is no way to solve this properly. Same problem 
with the different ISO-8859 character sets. This is one reason Unicode 
was created. So that all characters would be possible to represent 
uniquely. In my opinion Unicode sortof failed, though. Because they 
instead fell into the trap of the same character ending up having 
multiple code points because of typographic reasons, traditional 
silliness, or just plain stupidity. (Unicode shouldn't really care about 
typographic issues, but it sometimes do.)

   Johnny