Wikiposts
Search
Computer/Internet Issues & Troubleshooting Anyone with questions about the terribly complex world of computers or the internet should try here. NOT FOR REPORTING ISSUES WITH PPRuNe FORUMS! Please use the subforum "PPRuNe Problems or Queries."

Pest-bots already scanning...

Thread Tools
 
Search this Thread
 
Old 1st Dec 2004, 01:01
  #1 (permalink)  
Thread Starter
 
Join Date: Dec 2003
Location: UK
Posts: 211
Likes: 0
Received 0 Likes on 0 Posts
Pest-bots already scanning...

Hi,

About 21 days ago I registered a new www domain.

The site has only been up for an hour or two on/off as it's in development, and a blank page comes up currently when the url is visited.

I went in tonight to change features on the hosting, and thought I'd have a look at the visitor log, it's copied below:

216.144.233.206 - - [22/Nov/2004:01:19:39 +0000] "GET /robots.txt HTTP/1.0" 404 - "-" "-"
158-147-185-84.harris.com - - [23/Nov/2004:21:33:41 +0000] "GET / HTTP/1.1" 200 464 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"
crawl12-public.alexa.com - - [26/Nov/2004:14:11:30 +0000] "GET /robots.txt HTTP/1.0" 404 - "-" "ia_archiver"
crawl12-public.alexa.com - - [26/Nov/2004:14:11:30 +0000] "GET / HTTP/1.0" 200 464 "-" "ia_archiver"
So pests are scanning for my robots.txt file already!

Excuse me if this is stupid, but why are they scanning my robots.txt file? Isn't that the text file that gets you into search engines? How do they extrapolate data from it?

Can you take precautions against them? What are they doing?


Yours confused

Thanks in advance
WG774 is offline  
Old 1st Dec 2004, 13:30
  #2 (permalink)  
Supercalifragilistic
expialidocious
 
Join Date: Sep 2001
Location: Essex, UK
Posts: 588
Likes: 0
Received 0 Likes on 0 Posts
Why do you say pest?

Hi WG774,

It looks like your site is being crawled by the bot for www.alexa.com

Now i'd not choose to add the Alexa tool bar to my own machines, but i'm curious, why do you consider it's attempt to index your site as the actions of a pest?

regards

Memetic

P.S. Looking for your robots.txt file means the bot is being well behaved. The following might be of interest:

http://www.robotstxt.org/wc/exclusion-admin.html

Last edited by Memetic; 1st Dec 2004 at 14:11.
Memetic is offline  
Old 1st Dec 2004, 14:50
  #3 (permalink)  
Thread Starter
 
Join Date: Dec 2003
Location: UK
Posts: 211
Likes: 0
Received 0 Likes on 0 Posts
Thanks Memetic.

Guess I overreacted, as I was lead to believe by anti-pest software that anything relating to Alexa was sinister... I jumped to the conclusion they were looking to harvest information on keywords / email addresses and sell this info on.
WG774 is offline  
Old 2nd Dec 2004, 11:37
  #4 (permalink)  
Supercalifragilistic
expialidocious
 
Join Date: Sep 2001
Location: Essex, UK
Posts: 588
Likes: 0
Received 0 Likes on 0 Posts
That Anti-Pest software is probably talking about the Alexa tool bar.

The tool bar lets you quicly acces the Alexa populatity rankings for sites and other Alexa services, it also lets them track which sites you visit so that they can compile the site rankings, which is why it is considered spy ware by some. Personally as they tell you what it does up front and don't attempt to force you to download it i'd not consider it as Malware.

Memetic.
Memetic is offline  
Old 3rd Dec 2004, 09:05
  #5 (permalink)  

Spicy Meatball
 
Join Date: Jan 2004
Location: Liverpool UK
Age: 41
Posts: 1,115
Likes: 0
Received 0 Likes on 0 Posts
What you want is for the big boys to come and spider you - such as google etc. When your site is finished, try submitting it to google (presuming you want the site indexed)!

:http://www.google.com/addurl.html

Regards

Maz :ok
mazzy1026 is offline  
Old 3rd Dec 2004, 22:31
  #6 (permalink)  
 
Join Date: Jul 2003
Location: Scotland
Posts: 151
Likes: 0
Received 0 Likes on 0 Posts
WG774

If you don't want bots trawling your site looking for email addresses add the following bit of code to your headers.

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

I am sure there are unscrupulous types out there that will ignore this but it is the official way to do it.
Front_Seat_Dreamer is offline  

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off



Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service

Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.