[Info-vax] character set translation for language accents

Arne Vajhøj arne at vajhoej.dk
Thu Apr 16 20:25:35 EDT 2009


Neil Rieck wrote:
> On Apr 16, 4:15 pm, koeh... at eisner.nospam.encompasserve.org (Bob
> Koehler) wrote:
>> In article <2c7f1359-7d41-419a-9d5a-4e0041c14... at k38g2000yqh.googlegroups.com>, jcwoman1... at hotmail.com writes:
>>> I'm writing an interface between some software that runs on Windows
>>> and my software that runs on VMS, and having a problem with character
>>> set translation.  The Windows software is using a character set that
>>> enables accented characters.  I'm not sure exactly which one (utf-?)
>>> but the user interface for the software is in French.  When it sends
>>> text data through my interface, it's sending the accented characters.
>>> When the data comes into my program on VMS, the accented characters
>>> have been lost/removed.  My software is running on VMS 8.3 on
>>> Integrity.  Is there some way to make it accept/handle the accented
>>> characters properly?
>>    You're going to have to tell us more about how the connection between
>>    Windows and VMS is made, as well as the character sets in use on both
>>    ends.  Standard sockets will not alter bytes, but naked sockets are
>>    not typically used for data that includes font information.
>>
>>    What character set are you using on Windows (this may be chosen by
>>    the software you are running)?  If you are using an MS character set
>>    you should not expect anything else to recognise it.  If you are using
>>    ISO-Latin-1, VMS will recognise that on the assumption it is DEC MCS.  
>>    What are you using on VMS to display the text? X11 windows on VMS
>>    will recognise a great many standard fonts, but perhaps not the ones
>>    you are using.
>>
>>    And beware of things like MS "smart quotes".  These are done in
>>    violation of character set standards and will not show up correctly
>>    on most systems.
> 
> The majority of VMS green screen apps are written for 7-bit
> characters. When sending output to browsers there are two popular
> alternatives: ISO-8859-1 and UTF-8 and these must be declared in a
> meta statement like so:
> 
> <meta http-equiv="Content-Type" content="text/html;
> charset=iso-8859-1">
> 
> In a nut shell, iso-8859-1 characters require one character while
> utf-8 can require one or two bytes depending upon what you are trying
> to do.
> 
> Check out these docs for more details:
> http://www.w3schools.com/html/default.asp
> http://www.w3schools.com/tags/ref_charactersets.asp
> http://www.w3schools.com/tags/ref_entities.asp
> http://en.wikipedia.org/wiki/Utf-8

UTF-8 can even be more than 2 bytes if we go to asian character sets.

Arne



More information about the Info-vax mailing list