PPRuNe Forums

PPRuNe Forums (https://www.pprune.org/)
-   Computer/Internet Issues & Troubleshooting (https://www.pprune.org/computer-internet-issues-troubleshooting-46/)
-   -   Pest-bots already scanning... (https://www.pprune.org/computer-internet-issues-troubleshooting/154101-pest-bots-already-scanning.html)

WG774 1st Dec 2004 01:01

Pest-bots already scanning...
 
Hi,

About 21 days ago I registered a new www domain.

The site has only been up for an hour or two on/off as it's in development, and a blank page comes up currently when the url is visited.

I went in tonight to change features on the hosting, and thought I'd have a look at the visitor log, it's copied below:


216.144.233.206 - - [22/Nov/2004:01:19:39 +0000] "GET /robots.txt HTTP/1.0" 404 - "-" "-"
158-147-185-84.harris.com - - [23/Nov/2004:21:33:41 +0000] "GET / HTTP/1.1" 200 464 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
crawl12-public.alexa.com - - [26/Nov/2004:14:11:30 +0000] "GET /robots.txt HTTP/1.0" 404 - "-" "ia_archiver"
crawl12-public.alexa.com - - [26/Nov/2004:14:11:30 +0000] "GET / HTTP/1.0" 200 464 "-" "ia_archiver"
So pests are scanning for my robots.txt file already!

Excuse me if this is stupid, but why are they scanning my robots.txt file? Isn't that the text file that gets you into search engines? How do they extrapolate data from it?

Can you take precautions against them? What are they doing?


Yours confused :confused:

Thanks in advance

Memetic 1st Dec 2004 13:30

Why do you say pest?
 
Hi WG774,

It looks like your site is being crawled by the bot for www.alexa.com

Now i'd not choose to add the Alexa tool bar to my own machines, but i'm curious, why do you consider it's attempt to index your site as the actions of a pest?

regards

Memetic

P.S. Looking for your robots.txt file means the bot is being well behaved. The following might be of interest:

http://www.robotstxt.org/wc/exclusion-admin.html

WG774 1st Dec 2004 14:50

Thanks Memetic.

Guess I overreacted, as I was lead to believe by anti-pest software that anything relating to Alexa was sinister... I jumped to the conclusion they were looking to harvest information on keywords / email addresses and sell this info on.

Memetic 2nd Dec 2004 11:37

That Anti-Pest software is probably talking about the Alexa tool bar.

The tool bar lets you quicly acces the Alexa populatity rankings for sites and other Alexa services, it also lets them track which sites you visit so that they can compile the site rankings, which is why it is considered spy ware by some. Personally as they tell you what it does up front and don't attempt to force you to download it i'd not consider it as Malware.

Memetic.

mazzy1026 3rd Dec 2004 09:05

What you want is for the big boys to come and spider you - such as google etc. When your site is finished, try submitting it to google (presuming you want the site indexed)!

:http://www.google.com/addurl.html

Regards

Maz :ok

Front_Seat_Dreamer 3rd Dec 2004 22:31

WG774

If you don't want bots trawling your site looking for email addresses add the following bit of code to your headers.

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

I am sure there are unscrupulous types out there that will ignore this but it is the official way to do it.


All times are GMT. The time now is 19:04.


Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.