[Info-vax] Does OpenVMS Use Unicode?

Mon Jun 13 07:36:30 EDT 2016

On 2016-06-13 13:30, Jan-Erik Soderholm wrote:
> Den 2016-06-13 kl. 13:20, skrev Johnny Billquist:
>> On 2016-06-13 12:15, Jan-Erik Soderholm wrote:
>>
>>> We use the Python port to run our web applications. And Python
>>> uses 7-bit for it's basic "string" data type. So I simply made
>>> a short function to change into the HTML variants like:
>>>
>>> def html_esc(string):
>>>   tmpx1 = string.replace(u'\xe5','å')
>>>   tmpx1 = tmpx1.replace(u'\xe4','ä')
>>>   tmpx1 = tmpx1.replace(u'\xf6','ö')
>>>   tmpx1 = tmpx1.replace(u'\xf8','ø')
>>>   tmpx1 = tmpx1.replace(u'\xd8','Ä')
>>>   tmpx1 = tmpx1.replace(u'\xc7',' ')
>>>   tmpx1 = tmpx1.replace(u'[','Ä')
>>>   tmpx1 = tmpx1.replace(u']','Å')
>>>   tmpx1 = tmpx1.replace(u'\\','Ö')
>>>   return tmpx1
>>>
>>> Maybe there is something built-in in Python for this also,
>>> I do not know and I never looked for it. This works OK.
>>
>> I don't know how to break this to you gently so... You are not using
>> 7-bit
>> data for your strings. In fact, your code snippet here is clearly looking
>> for character with the 8th bit set. What you have there is essentially
>> Latin-1, or if you prefer to call it DEC MCS, or ISO 8859-1, or
>> Unicode, is
>> up to you. But it's definitely not 7-bit data strings...
>>
>>     Johnny
>>
>
> Yes, in *this* part it works, but if the characters are not converter
> it will break later...

I believe you. That is because later parts might not be understanding 
the coding used in Python... Which in turn comes back to the point about 
correctly informing whatever what the encoding scheme is used. Which 
often fails somewhere along the way. By doing the translation you are 
doing, you avoid the problem, as you move it down to a subset that 
becomes identical no matter if you use UTF-8, 8859-1, or plain 7-bit ASCII.

	Johnny