[Info-vax] Does OpenVMS Use Unicode?

Mon Jun 13 07:33:35 EDT 2016

On 2016-06-13 13:04, Jan-Erik Soderholm wrote:
> Den 2016-06-13 kl. 12:30, skrev lawrencedo99 at gmail.com:
>> On Monday, June 13, 2016 at 10:15:54 PM UTC+12, Jan-Erik Soderholm wrote:
>>
>>> Python uses 7-bit for it's basic "string" data type.
>>
>> Python strings are Unicode
>> <https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals>.
>>
>>
>
> Yes, but in Unicode, you can not encode single byte characters in the
> "upper" half of the 8 bit space. Anything in the "extended ASCII" part
> will be multi byte characters in Unicode. So characters like those in
> the example (that you snipped) will create errors in (som parts of)
> Python.

Uh. You are confusing Unicode with the encoding of Unicode characters in 
UTF-8. The first 256 characters in Unicode is identical to ISO 8859-1. 
However, if you choose to encode your string using UTF-8 then yes, UTF-8 
can only encode the low 128 code points as a single byte.
The rest will require multiple bytes. That that is not Unicode itself, 
but the UTF-8 encoding. If you instead use UTF-16, for example, then you 
can definitely code all the first 256 characters as a single word each, 
along with a whole lot more characters.

Unicode itself do not have encodings. It's a code set with lots and lots 
of characters. As mentioned, the range goes between 0x0 and 0x10FFFF. 
Exactly how you represent this in memory is a different story. There 
have been many different encoding schemes... The most obvious and easy 
one is to just use 32 bits for each character. But that is a bit 
wasteful most of the time...

And Python is using Unicode, as illustrated by your code. What do you 
think u'\xe5' means?

	Johnny