28, May 2012

does anyone have a list of search engine bot IP#'s? - webmaster forum

 
Webdigity webmaster forums
[ Home | Help | Search | Forum's Shop | Archive | Login | Register | Webmaster Directory ]
Webdigity Webmaster Forums  >  Web site promotion  >  Search Engine Optimization
Topic: does anyone have a list of search engine bot IP#'s?
« previous next »
Pages: [1] Print
Instabuck - The easy way to sell digital products online

Author Topic: does anyone have a list of search engine bot IP#'s?  (Read 2625 times)
aka J Love
Community Supporter ?
Bill Gates is my home boy
*****
Gender: Male
Posts: 886
1148 credits
Members referred : 4



« on: Jan 03, 2006, 06:33:10 am »

does neone have a good list of the IP#'s of various search engine bots like google, yahoo, msn, etc.. ?


Last blog : phpHaze 1.59.1 in Development
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 5799
46391 credits
Members referred : 3



« Reply #1 on: Jan 03, 2006, 07:41:39 am »

There is no such list.

The only way to identify a crawler visit is by the visitor's browser.

Trial and Error my two best teachers Cool
Join us @ facebook or twitter

Last blog : Butterfly Marketing 2.0
aka J Love
Community Supporter ?
Bill Gates is my home boy
*****
Gender: Male
Posts: 886
1148 credits
Members referred : 4



« Reply #2 on: Jan 03, 2006, 07:56:31 am »

what exactly about the browser they use tips you off? i can monitior this with HCL so that is no prob just need 2 know what 2 look for


Last blog : phpHaze 1.59.1 in Development
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 5799
46391 credits
Members referred : 3



« Reply #3 on: Jan 03, 2006, 09:11:47 am »


Trial and Error my two best teachers Cool
Join us @ facebook or twitter

Last blog : Butterfly Marketing 2.0
My Name is Enigo Montoya
*
Posts: 32
216 credits
Members referred : 0


« Reply #4 on: Jan 17, 2006, 08:35:46 am »

method, what will u do with that?

cloaking?
aka J Love
Community Supporter ?
Bill Gates is my home boy
*****
Gender: Male
Posts: 886
1148 credits
Members referred : 4



« Reply #5 on: Jan 17, 2006, 09:10:05 am »

no, when i am watching my visitors with Help Center Live I like to know who's who, but search bots show up as blank so its useless


Last blog : phpHaze 1.59.1 in Development
Sandwich Artist
*
Posts: 24
56 credits
Members referred : 0


« Reply #6 on: Mar 13, 2006, 02:30:19 pm »

ummm there was a list compiled a while ago on this site: http://johnny.ihackstuff.com (it's a real slow site, I think the list was in the forums)
WebDigity Gangsta
***
Posts: 105
564 credits
Members referred : 0



« Reply #7 on: Mar 25, 2006, 03:05:07 pm »

I have a list of alot of spider IPs (it's not all spiders, but it's a good start).

http://www.xavierforum.com/files/spider_ips.txt
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 5799
46391 credits
Members referred : 3



« Reply #8 on: Mar 25, 2006, 04:36:54 pm »

I have a list of alot of spider IPs (it's not all spiders, but it's a good start).

http://www.xavierforum.com/files/spider_ips.txt

Interesting list.

You used some kind of software to populate this?

Trial and Error my two best teachers Cool
Join us @ facebook or twitter

Last blog : Butterfly Marketing 2.0
WebDigity Gangsta
***
Posts: 105
564 credits
Members referred : 0



« Reply #9 on: Mar 25, 2006, 04:46:22 pm »

Well, I looked at my server logs for the last year and that list is the result. Smiley
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 5799
46391 credits
Members referred : 3



« Reply #10 on: Mar 25, 2006, 04:59:49 pm »

Oh I see. You got the difficult way I guess Smiley

You can also have a script doing this job.

Trial and Error my two best teachers Cool
Join us @ facebook or twitter

Last blog : Butterfly Marketing 2.0
WebDigity Gangsta
***
Posts: 105
564 credits
Members referred : 0



« Reply #11 on: Mar 25, 2006, 06:33:56 pm »

I have a spider trap too, I simply neede to speed up the work a little bit.

This is what my .htaccess looks like:

Code:
RewriteEngine on
RewriteRule ^robots.txt robots.php [L,NC]

Then I have a script called robots.php:

Code:
<?
$server = "localhost"; // MySQL hostname
$username = ""; // MySQL username
$password = ""; // MySQL password
$dbname = ""; // MySQL db name

$db = mysql_connect($server, $username, $password) or die(mysql_error());
mysql_select_db($dbname) or die(mysql_error());

mysql_query("INSERT INTO spiderips VALUES(INET_ATON('". $_SERVER['REMOTE_ADDR'] ."'),'". $_SERVER['HTTP_USER_AGENT'] ."',". time() .")");

mysql_close();

?>
User-agent: *
Disallow: /norobots/

Every time a spider visits my robots.txt I catch the spider. Then I have this code on some other pages (not all of my pages):

Code:
<?$server = "localhost"; // MySQL hostname
$username = ""; // MySQL username
$password = ""; // MySQL password
$dbname = ""; // MySQL db name

$db = mysql_connect($server, $username, $password) or die(mysql_error());
mysql_select_db($dbname) or die(mysql_error());

$spider_info = mysql_query("SELECT * FROM spiderips WHERE ip = INET_ATON('". $_SERVER['REMOTE_ADDR'] ."') OR useragent = '". $_SERVER['HTTP_USER_AGENT'] ."'");
if (mysql_num_rows($spider_info))
{
if (!mysql_query("INSERT INTO spiderips VALUES(INET_ATON('". $_SERVER['REMOTE_ADDR'] ."'),'". $_SERVER['HTTP_USER_AGENT'] ."',". time() .")"))
{
mysql_query("UPDATE spiderips SET useragent = '". $_SERVER['HTTP_USER_AGENT'] ."', lastvisit = '". time() ."' WHERE ip = INET_ATON('". $_SERVER['REMOTE_ADDR'] ."')");
}
}
mysql_close();
?>

This way I get the user agents for the spiders when some of them visits my robots.txt, then I get more IP addresses when I search in the database for the user agent that is a spider. I also get the last time a spider visited one of my sites.
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 5799
46391 credits
Members referred : 3



« Reply #12 on: Mar 25, 2006, 08:36:50 pm »

That's very clever. Congrats, and thanks for sharing.

BTW you can make this work better if you use the LOW_PRIORITY statement in your inserts and updates Smiley

Instead of this :

Code:
$spider_info = mysql_query("SELECT * FROM spiderips WHERE ip = INET_ATON('". $_SERVER['REMOTE_ADDR'] ."') OR useragent = '". $_SERVER['HTTP_USER_AGENT'] ."'");
if (mysql_num_rows($spider_info))
{
if (!mysql_query("INSERT INTO spiderips VALUES(INET_ATON('". $_SERVER['REMOTE_ADDR'] ."'),'". $_SERVER['HTTP_USER_AGENT'] ."',". time() .")"))
{
mysql_query("UPDATE spiderips SET useragent = '". $_SERVER['HTTP_USER_AGENT'] ."', lastvisit = '". time() ."' WHERE ip = INET_ATON('". $_SERVER['REMOTE_ADDR'] ."')");
}
}

you can use this :

Code:
mysql_query("REPLACE LOW_PRIORITY INTO spiderips VALUES(INET_ATON('". $_SERVER['REMOTE_ADDR'] ."'),'". $_SERVER['HTTP_USER_AGENT'] ."',". time() .")")

Trial and Error my two best teachers Cool
Join us @ facebook or twitter

Last blog : Butterfly Marketing 2.0
Chicken-run Manager
*
Posts: 9
58 credits
Members referred : 0


« Reply #13 on: Mar 25, 2006, 08:48:23 pm »

Cool script, I will use it too.

Thanks for sharing Smiley
Trackback URI for this entry : http://www.webdigity.com/trackback.php?topic=1121
Tags : mysql databases browsers robots.txt Bookmark this thread : Digg Del.icio.us Dzone more....

Pages: [1] Print 
Webdigity Webmaster Forums  >  Web site promotion  >  Search Engine Optimization
Topic: does anyone have a list of search engine bot IP#'s?
« previous next »
Jump to:
User Area
Welcome, Guest. Please login or register.
Did you miss your activation email?
May 28, 2012, 02:57:47 am





Login with username, password and session length

Donate to our community, and get a permanent link back to your site!

Donate to our community, and get a permanent link back to your site!


Forum Statistics
Total Posts: 62.814
Total Topics: 11.028
Total Members: 21.451
Tutorials : 58
Resources : 929
Designs : 395
Latest Member: sobbin

125 Guests, 1 User online :

10 users online today:




Web Design Gallery · Whois Lookup · Pagerank · Tag Browsing · Lo-fi version · Syndication · Webmaster forum history · Advertise
Developed by HumanWorks © 2005 - 2012 Webdigity webmaster community · sublime directory
Webdigity Webmaster Forums | Powered by SMF 1.0.12. © 2001-2005, Lewis Media. All Rights Reserved.