[Info-vax] proper file format, attributes for non-binary files served by a web server
Phillip Helbig---undress to reply
helbig at astro.multiCLOTHESvax.de
Sat Jan 21 11:20:02 EST 2012
In article <sl4ru8-utb.ln1 at news.sture.ch>, Paul Sture <paul at sture.ch>
writes:
> I don't have experience of OSU on VMS, but have used CSWS and WASD.
>
> With one release, CSWS moved to wanting line feed terminated files, and
> provided a DCL procedure to convert the files in your whole document tree.
Sounds like a reason not to use CSWS!
> I found robots.txt to be a special case. I forget whether I was using
> CSWS or WASD at the time, but I had it as a <CR><LF> terminated file, and
> the robots validators I found out there choked on it. Converting it to
> lf-stream format cured that. It also explained why Google was going
> places I'd told it not to.
>
> BTW I also found that with any syntax error in robots.txt Google would
> ignore the file completely and rip through parts of my site I'd told it
> not to.
As Arne recently replied to a question from me on the OSU list, since
robots.txt is case-sensitive, one needs to include all possible case
combinations in it, or at least those for which there are links
somewhere. In particular, if a directory is browsable, the OSU server
will return uppercase filenames for the contents, while direct links to
these will be however they were written.
It would be nice if the access log of the web server provided an
indication of whether something was a robot or not. (For the common
ones, one can see this from the name, of course.) One could then use
SEARCH to come up with a list of pages the search engines are hitting.
More information about the Info-vax
mailing list