[Info-vax] character set translation for language accents
Neil Rieck
n.rieck at sympatico.ca
Mon Apr 20 22:23:45 EDT 2009
On Apr 16, 8:25 pm, Arne Vajhøj <a... at vajhoej.dk> wrote:
> Neil Rieck wrote:
> > On Apr 16, 4:15 pm, koeh... at eisner.nospam.encompasserve.org (Bob
> > Koehler) wrote:
> >> In article <2c7f1359-7d41-419a-9d5a-4e0041c14... at k38g2000yqh.googlegroups.com>, jcwoman1... at hotmail.com writes:
> >>> I'm writing an interface between some software that runs on Windows
> >>> and my software that runs on VMS, and having a problem with character
> >>> set translation. The Windows software is using a character set that
> >>> enables accented characters. I'm not sure exactly which one (utf-?)
> >>> but the user interface for the software is in French. When it sends
> >>> text data through my interface, it's sending the accented characters.
> >>> When the data comes into my program on VMS, the accented characters
> >>> have been lost/removed. My software is running on VMS 8.3 on
> >>> Integrity. Is there some way to make it accept/handle the accented
> >>> characters properly?
> >> You're going to have to tell us more about how the connection between
> >> Windows and VMS is made, as well as the character sets in use on both
> >> ends. Standard sockets will not alter bytes, but naked sockets are
> >> not typically used for data that includes font information.
>
> >> What character set are you using on Windows (this may be chosen by
> >> the software you are running)? If you are using an MS character set
> >> you should not expect anything else to recognise it. If you are using
> >> ISO-Latin-1, VMS will recognise that on the assumption it is DEC MCS.
> >> What are you using on VMS to display the text? X11 windows on VMS
> >> will recognise a great many standard fonts, but perhaps not the ones
> >> you are using.
>
> >> And beware of things like MS "smart quotes". These are done in
> >> violation of character set standards and will not show up correctly
> >> on most systems.
>
> > The majority of VMS green screen apps are written for 7-bit
> > characters. When sending output to browsers there are two popular
> > alternatives: ISO-8859-1 and UTF-8 and these must be declared in a
> > meta statement like so:
>
> > <meta http-equiv="Content-Type" content="text/html;
> > charset=iso-8859-1">
>
> > In a nut shell, iso-8859-1 characters require one character while
> > utf-8 can require one or two bytes depending upon what you are trying
> > to do.
>
> > Check out these docs for more details:
> >http://www.w3schools.com/html/default.asp
> >http://www.w3schools.com/tags/ref_charactersets.asp
> >http://www.w3schools.com/tags/ref_entities.asp
> >http://en.wikipedia.org/wiki/Utf-8
>
> UTF-8 can even be more than 2 bytes if we go to asian character sets.
>
> Arne- Hide quoted text -
>
> - Show quoted text -
Oops. You are correct and UTF-8 can support up to four. I should have
said "in my day-to-day work I have never seen more than two"
NSR
More information about the Info-vax
mailing list