7, September 2008

Question, Tutorial: Detecting Googlebot with PHP - webmaster forum

 
Webdigity webmaster forums
This forum shares its ad revenue with its members!
[ Home | Help | Search | Forum's Shop | Archive | Login | Register | Webmaster Directory ]
Webdigity Webmaster Forums  >  Web Development  >  PhP
Topic: Question, Tutorial: Detecting Googlebot with PHP
« previous next »
Pages: [1] Print

Author Topic: Question, Tutorial: Detecting Googlebot with PHP  (Read 647 times)
aka J Love
Community Supporter ?
Bill Gates is my home boy
*****
Gender: Male
Posts: 884
1636 credits
Members referred : 4



« on: Sep 24, 2007, 12:49:11 AM »

not asking for help on this tutorial, "detecting googlebot with php", just to say that first Smiley.

i did follow it, and successfully can detect when googlebot logs into the script so its a great tutorial. however, my question is can the same concept be used for msnbot and yahoo! slurp? i have tried with the following and they don't seem to pick up the visits (when obviously they are crawling)

Yahoo! Slurp:
string for user agent = help.yahoo.com/help/us/ysearch/slurp (found from wiki article on slurp!)

MSNBot/Livebot:
string for user agent = search.live.com (found from webalizer statistics)

i assume that since they are working on webdigity, i must be doing something wrong using the wrong strings or something? here is the PHP code that actually detects them:

Code:
<?php
$spiders 
dbquery("SELECT * FROM ".PRE."users WHERE is_spider = '1'");
if (
dbrows($spiders)){
while ($bot dbarray($spiders)){
if (strstr($_SERVER['HTTP_USER_AGENT'], $bot['username']) == true){
$host gethostbyaddr(USER_IP);
if (substr($host, (strlen($host)-13)) == $bot['location']){
$spider_visited dbquery("UPDATE LOW_PRIORITY ".PRE."users SET `lastvisit` = '".time()."', `ip` = '".USER_IP."' WHERE `id` = '".$bot['id']."'");

}
}
}
?>


thanks in advance for your help Smiley

Visit through proxy Visit through proxy Visit through proxy

Last blog : phpHaze 1.59.1 in Development
Global Moderator
Internet Junkie
*****
Gender: Male
Posts: 1807
9006 credits
Members referred : 6



« Reply #1 on: Sep 24, 2007, 01:07:20 AM »

Quote
Yahoo! Slurp
is the yahoo user agent

and
Quote
Msnbot
is the one for MSN if I'm not mistaken...


Last blog : Are You Stumbling Yet?
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8037
41179 credits
Members referred : 3



« Reply #2 on: Sep 24, 2007, 12:53:07 PM »

Nico is right. Plus you should strpos this because most bots have also a version identifier that changes (eg. Mediabot 1.0)

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : MIA - Where Nick and Tim
aka J Love
Community Supporter ?
Bill Gates is my home boy
*****
Gender: Male
Posts: 884
1636 credits
Members referred : 4



« Reply #3 on: Sep 24, 2007, 08:56:01 PM »

thanks guys Smiley i have updated the details for both bots, waiting on visits now Smiley.. and nik, i followed your exact tutorial.. does it show strstr in tutorial? i know why to use strpos, much faster of course Smiley thanks for the tip and help as always

this is the code im using now Nik, mind verifying? have changed it a bit

$spiders dbquery("SELECT * FROM ".PRE."users WHERE is_spider = '1'");
if (
dbrows($spiders)){
	
while (
$bot dbarray($spiders)){
	
	
if (
strpos($_SERVER['HTTP_USER_AGENT'], $bot['username'])){
	
	
//if (strstr($_SERVER['HTTP_USER_AGENT'], $bot['username']) == true){
	
	
	
$host gethostbyaddr(USER_IP);
	
	
	
if (
substr($host, (strlen($host)-13)) == $bot['location']){
	
	
	
	
$spider_visited dbquery("UPDATE ".PRE."users SET `lastvisit` = '".time()."' WHERE `id` = '".$bot['id']."'");
	
	
	
	
if (
$bot['ip'] != USER_IPdbquery("UPDATE LOW_PRIORITY ".PRE."users SET `ip` = '".USER_IP."', `hostname` = '$host' WHERE `id` = '".$bot['id']."'");
	
	
	

	
	
}
	
}
}
« Last Edit: Sep 24, 2007, 10:14:23 PM by Meth0d »

Visit through proxy Visit through proxy Visit through proxy

Last blog : phpHaze 1.59.1 in Development
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8037
41179 credits
Members referred : 3



« Reply #4 on: Sep 24, 2007, 10:36:18 PM »

This:
if (strpos($_SERVER['HTTP_USER_AGENT'], $bot['username']))

should be

if (strpos($_SERVER['HTTP_USER_AGENT'], $bot['username'])!==false)

BTW why do you use the gethostbyaddr() call?

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : MIA - Where Nick and Tim
aka J Love
Community Supporter ?
Bill Gates is my home boy
*****
Gender: Male
Posts: 884
1636 credits
Members referred : 4



« Reply #5 on: Sep 25, 2007, 01:42:21 AM »

i don't suppose it would be necessary for the spiders, but when users visit the site it checks their IP against what is stored, if it has changed it will update their hostname as well which is stored.. so the hostname call won't come everytime, to save the "every page load" issue.. the reason for storing it for users is for the blacklist, which can ban by IP, email, and hostname

thx for the tip on strpos, ive noticed when not using the 3rd operator "===" or "!==" it operates differently, taken down as a fundamental note thanks again Nikolas!!

still working on MSNbot and Yahoo! Slurp, haven't visited yet

Visit through proxy Visit through proxy Visit through proxy

Last blog : phpHaze 1.59.1 in Development
aka J Love
Community Supporter ?
Bill Gates is my home boy
*****
Gender: Male
Posts: 884
1636 credits
Members referred : 4



« Reply #6 on: Sep 26, 2007, 09:56:14 PM »

ya.. i dont think those 2 solutions for msn and yahoo are working guys :S.. here is the data its pulling for them, and the evaluated code:

Note: "Data" below is case sensitive.

Original PHP Code that Loops for All Spiders.
//Added for detection of spiders
$spiders dbquery("SELECT * FROM ".PRE."users WHERE is_spider = '1'");
if (
dbrows($spiders)){
	
while (
$bot dbarray($spiders)){
	
	
if (
strpos($_SERVER['HTTP_USER_AGENT'], $bot['username'])!==false){
	
	
	
$host gethostbyaddr(USER_IP);
	
	
	
if (
substr($host, (strlen($host)-13)) == $bot['location']){
	
	
	
	
$spider_visited dbquery("UPDATE ".PRE."users SET `lastvisit` = '".time()."' WHERE `id` = '".$bot['id']."'");
	
	
	
	
if (
$bot['ip'] != USER_IPdbquery("UPDATE LOW_PRIORITY ".PRE."users SET `ip` = '".USER_IP."', `hostname` = '$host' WHERE `id` = '".$bot['id']."'");
	
	
	

	
	
}
	
}
}


MSNBot
Data: (name: 'MSNbot'; agent: 'Msnbot')
Code evaluated:
if (strpos($_SERVER['HTTP_USER_AGENT'], "MSNBot")!==false){
	
$host gethostbyaddr(USER_IP);
	
if (
substr($host, (strlen($host)-13)) == "Msnbot"){
	
	
$spider_visited dbquery("UPDATE ".PRE."users SET `lastvisit` = '".time()."' WHERE `id` = '".$bot['id']."'");
	
	
if (
$bot['ip'] != USER_IPdbquery("UPDATE LOW_PRIORITY ".PRE."users SET `ip` = '".USER_IP."', `hostname` = '$host' WHERE `id` = '".$bot['id']."'");
	

}


Yahoo! Slurp
Data: name: 'Yahoo! Slurp'; agent: 'Yahoo! Slurp')
Code evaluated:
if (strpos($_SERVER['HTTP_USER_AGENT'], "Yahoo! Slurp")!==false){
	
$host gethostbyaddr(USER_IP);
	
if (
substr($host, (strlen($host)-13)) == "Yahoo! Slurp"){
	
	
$spider_visited dbquery("UPDATE ".PRE."users SET `lastvisit` = '".time()."' WHERE `id` = '".$bot['id']."'");
	
	
if (
$bot['ip'] != USER_IPdbquery("UPDATE LOW_PRIORITY ".PRE."users SET `ip` = '".USER_IP."', `hostname` = '$host' WHERE `id` = '".$bot['id']."'");
	

}




with this, MSNBot and yahoo! havent visited yet (According to this script), and i've been watching closely for a few days at my stats, and they are visiting there.
« Last Edit: Sep 26, 2007, 11:23:22 PM by Meth0d »

Visit through proxy Visit through proxy Visit through proxy

Last blog : phpHaze 1.59.1 in Development
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8037
41179 credits
Members referred : 3



« Reply #7 on: Sep 26, 2007, 11:57:16 PM »

Here are a few rules for detecting search engine bots:

if substr($_SERVER['HTTP_USER_AGENT'], 06) == "msnbot" )
else if ( 
substr($_SERVER['HTTP_USER_AGENT'], 020 ) == "Mediapartners-Google"  )//adsense
else if ( substr($_SERVER['HTTP_USER_AGENT'], 0) == "Alexa"  )
else if ( 
substr($_SERVER['HTTP_USER_AGENT'], 0) == "curl"  )
else if ( 
strstr($_SERVER['HTTP_USER_AGENT'], "Googlebot" ) == true )
else if ( 
substr($_SERVER['HTTP_USER_AGENT'], 022 ) == "Mozilla/5.0 (Slurp/si;"  )
else if ( 
substr($_SERVER['HTTP_USER_AGENT'], 012 ) == "Lycos_Spider"  )
else if ( 
substr($_SERVER['HTTP_USER_AGENT'], 011 ) == "Baiduspider"  )
else if ( 
substr($_SERVER['HTTP_USER_AGENT'], 017 ) == "Accoona-AI-Agent/" )
else if ( 
substr($_SERVER['HTTP_USER_AGENT'], 018 ) == "Chitika ContentHit" )
else if ( 
substr($_SERVER['HTTP_USER_AGENT'], 037 ) == "Mozilla/5.0 (compatible; Yahoo! Slurp" )

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : MIA - Where Nick and Tim
aka J Love
Community Supporter ?
Bill Gates is my home boy
*****
Gender: Male
Posts: 884
1636 credits
Members referred : 4



« Reply #8 on: Sep 27, 2007, 06:31:35 AM »

are the 2 that say Slurp both for the same Yahoo! Slurp bot?

edit- also i see that it is using strstr for google; that must be why my first version used it Wink
« Last Edit: Sep 27, 2007, 06:43:42 AM by Meth0d »

Visit through proxy Visit through proxy Visit through proxy

Last blog : phpHaze 1.59.1 in Development
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8037
41179 credits
Members referred : 3



« Reply #9 on: Sep 27, 2007, 11:51:21 AM »

The second one is the correct for Yahoo

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : MIA - Where Nick and Tim
aka J Love
Community Supporter ?
Bill Gates is my home boy
*****
Gender: Male
Posts: 884
1636 credits
Members referred : 4



« Reply #10 on: Sep 27, 2007, 10:27:58 PM »

thanks very much, Nik! this works great for MSNbot and MagpieRSS (so far; Webdigity RSS visiting quite often!)

Yahoo! Slurp hasn't yet, but it surely will soon. Thanks again Nikolas and Webdigity!

Visit through proxy Visit through proxy Visit through proxy

Last blog : phpHaze 1.59.1 in Development
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8037
41179 credits
Members referred : 3



« Reply #11 on: Sep 28, 2007, 11:19:46 AM »

No worries man. That's what friends are for Smiley

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : MIA - Where Nick and Tim
Trackback URI for this entry : http://www.webdigity.com/trackback.php?topic=7151
Tags : detecting googlebot msnbot yahoo slurp php Bookmark this thread : Digg Del.icio.us Dzone more....

Topic sponsors:
Get a permanent link here for $1.99!


Pages: [1] Print 
Webdigity Webmaster Forums  >  Web Development  >  PhP
Topic: Question, Tutorial: Detecting Googlebot with PHP
« previous next »
Jump to:
User Area
Welcome, Guest. Please login or register.
Did you miss your activation email?
Sep 07, 2008, 06:24:20 AM





Login with username, password and session length

Donate to our community, and get a permanent link back to your site!

Donate to our community, and get a permanent link back to your site!


Forum Statistics
Total Posts: 36.301
Total Topics: 7.479
Total Members: 3.904
Tutorials : 56
Resources : 143
Designs : 220
Latest Member: Brandon

28 Guests, 3 Users online :

11 users online today:



Readers

Web Design Gallery · Whois Lookup · Pagerank · Tag Browsing · Lo-fi version · Syndication · Webmaster forum history · Advertise
Developed by HumanWorks © 2005 - 2008 Webdigity webmaster community · sublime directory
Webdigity Webmaster Forums | Powered by SMF 1.0.12. © 2001-2005, Lewis Media. All Rights Reserved.