Pest-bots already scanning...
Hi,
About 21 days ago I registered a new www domain. The site has only been up for an hour or two on/off as it's in development, and a blank page comes up currently when the url is visited. I went in tonight to change features on the hosting, and thought I'd have a look at the visitor log, it's copied below: 216.144.233.206 - - [22/Nov/2004:01:19:39 +0000] "GET /robots.txt HTTP/1.0" 404 - "-" "-" 158-147-185-84.harris.com - - [23/Nov/2004:21:33:41 +0000] "GET / HTTP/1.1" 200 464 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)" crawl12-public.alexa.com - - [26/Nov/2004:14:11:30 +0000] "GET /robots.txt HTTP/1.0" 404 - "-" "ia_archiver" crawl12-public.alexa.com - - [26/Nov/2004:14:11:30 +0000] "GET / HTTP/1.0" 200 464 "-" "ia_archiver" Excuse me if this is stupid, but why are they scanning my robots.txt file? Isn't that the text file that gets you into search engines? How do they extrapolate data from it? Can you take precautions against them? What are they doing? Yours confused :confused: Thanks in advance |
Why do you say pest?
Hi WG774,
It looks like your site is being crawled by the bot for www.alexa.com Now i'd not choose to add the Alexa tool bar to my own machines, but i'm curious, why do you consider it's attempt to index your site as the actions of a pest? regards Memetic P.S. Looking for your robots.txt file means the bot is being well behaved. The following might be of interest: http://www.robotstxt.org/wc/exclusion-admin.html |
Thanks Memetic.
Guess I overreacted, as I was lead to believe by anti-pest software that anything relating to Alexa was sinister... I jumped to the conclusion they were looking to harvest information on keywords / email addresses and sell this info on. |
That Anti-Pest software is probably talking about the Alexa tool bar.
The tool bar lets you quicly acces the Alexa populatity rankings for sites and other Alexa services, it also lets them track which sites you visit so that they can compile the site rankings, which is why it is considered spy ware by some. Personally as they tell you what it does up front and don't attempt to force you to download it i'd not consider it as Malware. Memetic. |
What you want is for the big boys to come and spider you - such as google etc. When your site is finished, try submitting it to google (presuming you want the site indexed)!
:http://www.google.com/addurl.html Regards Maz :ok |
WG774
If you don't want bots trawling your site looking for email addresses add the following bit of code to your headers. <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> I am sure there are unscrupulous types out there that will ignore this but it is the official way to do it. |
All times are GMT. The time now is 19:04. |
Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.