Recently I came to a major problem with our server. After installing the mod_evasive to our server, the module treated Google bot like a Ddos attacker. The results is that there are thousands of pages in our sites that Google can't crawl, and that way we lose a lot of traffic.
The solution is to find which ips from Google are visiting your sites, and use those ips in the mod_evasive configuration in order to allow them visit your site without time limitations. So if we suppose that we knew all the ips of Google, we would need to use this directive in the Apache configuration file :
But in order to do that we need to know which ips Google is using to crawl our web site. In this you should also know that there are many spam bots that use GoogleBot user agent to visit your site, so we need to check each ip if has a correct reverse dns entry.
Here is the solution to detect those bot visits with PhP :
Code:
<?php if ( strstr($_SERVER['HTTP_USER_AGENT'], "Googlebot" ) == true ){ //User has the GoogleBot user agent, but is it a real google bot? $host = gethostbyaddr($_SERVER['REMOTE_ADDR']); if ( substr($host, (strlen($host)-13)) == 'googlebot.com' ) //real bot else //fake bot } ?>
This snippet is doing the detection work, you can work it a little more if you want to make it log the ips.