[Info-vax] proper file format, attributes for non-binary files served by a web server

Phillip Helbig---undress to reply helbig at astro.multiCLOTHESvax.de
Sat Jan 21 11:20:02 EST 2012


In article <sl4ru8-utb.ln1 at news.sture.ch>, Paul Sture <paul at sture.ch>
writes: 

> I don't have experience of OSU on VMS, but have used CSWS and WASD.
> 
> With one release, CSWS moved to wanting line feed terminated files, and 
> provided a DCL procedure to convert the files in your whole document tree.

Sounds like a reason not to use CSWS!

> I found robots.txt to be a special case.  I forget whether I was using 
> CSWS or WASD at the time, but I had it as a <CR><LF> terminated file, and 
> the robots validators I found out there choked on it.  Converting it to 
> lf-stream format cured that.  It also explained why Google was going 
> places I'd told it not to.  
> 
> BTW I also found that with any syntax error in robots.txt Google would 
> ignore the file completely and rip through parts of my site I'd told it 
> not to.

As Arne recently replied to a question from me on the OSU list, since 
robots.txt is case-sensitive, one needs to include all possible case 
combinations in it, or at least those for which there are links 
somewhere.  In particular, if a directory is browsable, the OSU server 
will return uppercase filenames for the contents, while direct links to 
these will be however they were written.

It would be nice if the access log of the web server provided an 
indication of whether something was a robot or not.  (For the common 
ones, one can see this from the name, of course.)  One could then use 
SEARCH to come up with a list of pages the search engines are hitting.




More information about the Info-vax mailing list