[Info-vax] Micro Focus to be acquired by Open Text
Johnny Billquist
bqt at softjar.se
Fri Sep 2 10:37:24 EDT 2022
On 2022-09-02 15:42, Arne Vajhøj wrote:
> On 9/2/2022 9:05 AM, Simon Clubley wrote:
>> On 2022-09-01, Jan-Erik Söderholm <jan-erik.soderholm at telia.com> wrote:
>>> Just read up a little.
>>>
>>> https://en.wikipedia.org/wiki/National_Replacement_Character_Set
>
>> Professionally, I've grown up with 8-bit character sets and then, later,
>> UTF-8, so seeing this earlier standard looks really weird and alien to
>> me.
>
>> The idea that the same 7-bit character position can mean different
>> things in adjacent countries (such as Norway and Sweden) is indeed a
>> very alien idea to me and that would mean a company operating in both
>> countries would have some serious data interchange issues.
>
> ISO-8859 has the same basic issue. Multiple meanings of same
> code - not per country like for ISO-646 but per region.
>
> One country could be using ISO-8859-1 (western europe) and
> the neighbor country could be using ISO-8859-2 (eastern europe).
Very good point. There is nothing old/weird here. ISO-646 is no
different than ISO-8859 really. Same value can mean different things,
depending on which character set we're talking about.
And as someone mentioned UTF-8, they really need to understand that
UTF-8 is not a character encoding at all, but an encoding of large
values in 8-bit quantities. What they most likely mean is Unicode, which
actually is identical with ISO-8859-1 for the first 256 code points.
It's just that if you encode Unicode using UTF-8, then a character like
Ä becomes two bytes, yes. But the value is actually identical to the one
in ISO-8859-1.
And as far as data interchange issues goes, yes. It can be sortof a
problem if a text have a character that does not exist in another
character set. And there is no way to solve this properly. Same problem
with the different ISO-8859 character sets. This is one reason Unicode
was created. So that all characters would be possible to represent
uniquely. In my opinion Unicode sortof failed, though. Because they
instead fell into the trap of the same character ending up having
multiple code points because of typographic reasons, traditional
silliness, or just plain stupidity. (Unicode shouldn't really care about
typographic issues, but it sometimes do.)
Johnny
More information about the Info-vax
mailing list