[Info-vax] proper file format, attributes for non-binary files served by a web server
Paul Sture
paul at sture.ch
Fri Jan 27 08:54:49 EST 2012
On Sat, 21 Jan 2012 16:20:02 +0000, Phillip Helbig---undress to reply
wrote:
Hmm, dunno why my last attempt here suppressed the quotes :-(
>
> As Arne recently replied to a question from me on the OSU list, since
> robots.txt is case-sensitive, one needs to include all possible case
> combinations in it, or at least those for which there are links
> somewhere. In particular, if a directory is browsable, the OSU server
> will return uppercase filenames for the contents, while direct links to
> these will be however they were written.
Good point, and one I hadn't really considered, obvious though it may be.
Since I moved to a FreeBSD solution for hosting I have tried to keep
all my content in lower case.
> It would be nice if the access log of the web server provided an
> indication of whether something was a robot or not. (For the common
> ones, one can see this from the name, of course.) One could then use
> SEARCH to come up with a list of pages the search engines are hitting.
AWSTATS does indicate how many hits come from robots even if it doesn't
recognize the names, and separates them out from "real hits" in the
results. I haven't looked at the code to see how it determines whether a
hit comes from a robot or not.
--
Paul Sture
More information about the Info-vax
mailing list