[Info-vax] proper file format, attributes for non-binary files served by a web server

Paul Sture paul at sture.ch
Thu Jan 26 11:21:44 EST 2012


On Sat, 21 Jan 2012 16:20:02 +0000, Phillip Helbig---undress to reply
wrote:

> As Arne recently replied to a question from me on the OSU list, since
> robots.txt is case-sensitive, one needs to include all possible case
> combinations in it, or at least those for which there are links
> somewhere.  In particular, if a directory is browsable, the OSU server
> will return uppercase filenames for the contents, while direct links to
> these will be however they were written.

Good point, and one I hadn't really considered, obvious though it may be. 
Since I moved to a FreeBSD solution for hosting I have tried to keep
all my content in lower case.
 
> It would be nice if the access log of the web server provided an
> indication of whether something was a robot or not.  (For the common
> ones, one can see this from the name, of course.)  One could then use
> SEARCH to come up with a list of pages the search engines are hitting.

AWSTATS does indicate how many hits come from robots even if it doesn't 
recognize the names, and separates them out from "real hits" in the 
results.  I haven't looked at the code to see how it determines whether a 
hit comes from a robot or not.

-- 
Paul Sture



More information about the Info-vax mailing list