[Info-vax] Open Source on OpenVMS - A Progress Report

Wed Oct 21 04:44:44 EDT 2009

In article
<7efe77bd-4241-4b67-8dbc-85bd5e0a554b at p9g2000vbl.googlegroups.com>,
MetaEd <metaed at gmail.com> writes: 

> > There is absolutely nothing
> > saying that this newsgroup or any other group should be
> > only supporting characters that happens to be in the
> > *english* alphabet or that it must be 7-bit plain ASCII.
> 
> Actually, there is, RFC 1036. This controls the message format for all
> messages posted to newsgroups. The message format must follow RFC 822
> with some minor modifications. RFC 822 is limited to ASCII (7-bit
> codes).

Right.

> Any message composed with characters that cannot be represented with
> ASCII codes must be stripped of those characters or encoded somehow to
> ASCII before transmission. The de facto standard for encoding text is
> RFC 2045--2049 (MIME).
> 
> As of this writing, Google Groups does not use MIME when all the
> characters of the message can be represented with ASCII codes.
> 
> Otherwise, if the message can be represented with Latin-1 (ISO-8859-1)
> codes, Google Groups does so, and encodes with MIME using Quoted-
> Printable. Because Latin-1 is an ASCII superset, and because Quoted-
> Printable preserves most ASCII codes, this causes ASCII to be used to
> encode the message for transmission wherever possible. Other
> characters are encoded with a hex notation. Long lines are also
> preserved using a line continuation code. So, despite being encoded,
> these messages are pretty easy to comprehend using a newsreader that
> lacks MIME support.

I have an EDT macro which does the decoding (see below).

> But if the message cannot be represented with Latin-1 codes, Google
> Groups uses UTF-8 codes, and encodes with MIME using Base64. UTF-8 and
> Base64 are too different from ASCII for such messages to be
> comprehended easily using a newsreader that lacks MIME support.

One can extract them and run B64DECODE.EXE on them.  However, such 
messages USUALLY have no place in a newsgroup in the first place.

> The attribution line which Google Groups creates in the body (for
> example: "On Oct 20, 2:06 pm, MetaEd <met... at gmail.com> wrote")
> contains a Latin-1 non-breaking space (code A0) between the minutes
> and the "am" or "pm". This is a character which does not exist in
> ASCII.
> 
> As a courtesy to readers having no MIME support, posters can replace
> the non-breaking space with a plain space. This will avoid MIME
> encoding, as long as the message has no other non-ASCII characters.

Good suggestion.

> And, as a courtesy to posters who are spelling names and places
> properly using non-ASCII codes, readers can learn to read MIME encoded
> messages or use a newsreader that has MIME support.

Something which breaks the RFC but provides few if any problems for most 
people, whatever newsreader folks are using, is to use 8-bit characters 
WITHOUT encoding.  This is analogous to doing so in VMS MAIL (but don't 
forget to set the transport to 8-bit in the SMTP configuration).  Any 
newsreader which has fancy features will probably assume ISO-LATIN-1 and 
get it right, as will many WITHOUT fancy features.  Such codes can be 
entered from a VMS keyboard with the compose key.  If you have FORTRAN 
installed, do HELP FORT CHAR DEC to get the DEC multinational set (which 
is almost ISO-LATIN-1):

          +------------------------------------------+
          |     8     9      A   B   C   D   E   F   |
          +---+--------------------------------------+
          | 0 |       DCS        °   À       à       |
          | 1 |       PU1    ¡   ±   Á   Ñ   á   ñ   |
          | 2 |       PU2    ¢   ²   Â   Ò   â   ò   |
          | 3 |       STS    £   ³   Ã   Ó   ã   ó   |
          | 4 | IND   CCH            Ä   Ô   ä   ô   |
          | 5 | NEL   MW     ¥   µ   Å   Õ   å   õ   |
          | 6 | SSA   SPA        ¶   Æ   Ö   æ   ö   |
          | 7 | ESA   EPA    §   ·   Ç   ×   ç   ÷   |
          | 8 | HTS          ¨       È   Ø   è   ø   |
          | 9 | HTJ          ©   ¹   É   Ù   é   ù   |
          | A | VTS          ª   º   Ê   Ú   ê   ú   |
          | B | PLD   CSI    «   »   Ë   Û   ë   û   |
          | C | PLU   ST         ¼   Ì   Ü   ì   ü   |
          | D | RI    OSC        ½   Í   Ý   í   ý   |
          | E | SS2   PM             Î       î       |
          | F | SS3   APC        ¿   Ï   ß   ï       |
          +---+--------------------------------------+

! quoted printable
!
! Create a buffer with two blank lines (for some reason one
! blank line is not enough???)
!
find buffer cr_buffer
insert;
insert;
find last
!
DEFINE MACRO KQP
FIND BUFFER KQP
INSERT;s|=2c|,|w
INSERT;s|=FC|ü|w
INSERT;s|=DF|ß|w
INSERT;s|=F6|ö|w
INSERT;s|=E4|ä|w
INSERT;s|=3D|=|w
INSERT;s|=A0| |w
INSERT;s|=91|`|w
INSERT;s|=92|'|w
INSERT;s|=5F|_|w
INSERT;s|=20||w
INSERT;s|=C4|Ä|w
INSERT;s|=D6|Ö|w
INSERT;s|=DC|Ü|w
INSERT;s|=BA|º|w
INSERT;s|=95|·|w
INSERT;s|=2E|.|w
INSERT;s|=2D|-|w
INSERT;s|=E9|é|w
INSERT;s|=E1|á|w
INSERT;s|=C1|Á|w
INSERT;s|=E8|è|w
INSERT;s|=93|<I>|w
INSERT;s|=94|</I>|w
INSERT;s|=E5|å|w
INSERT;s|=96|---|w
INSERT;s|=20| |w
! Linefeed/CR Combination
INSERT;%B
INSERT;change; 9999('=0A=0D' cutsr paste=cr_buffer) ex
! CR/Linefeed Combination
INSERT;%B
INSERT;change; 9999('=0D=0A' cutsr paste=cr_buffer) ex
! Linefeed
INSERT;%B
INSERT;change; 9999('=0A' cutsr paste=cr_buffer) ex
! CR
INSERT;%B
INSERT;change; 9999('=0D' cutsr paste=cr_buffer) ex
INSERT;%B
find last