14, October 2008

Majestic-12 a massive crawler - webmaster forum

 
Webdigity webmaster forums
This forum shares its ad revenue with its members!
[ Home | Help | Search | Forum's Shop | Archive | Login | Register | Webmaster Directory ]
Webdigity Webmaster Forums  >  Design and Layout  >  General webmaster discussions (Moderator: Meth0d)
Topic: Majestic-12 a massive crawler
« previous next »
Pages: [1] Print

Author Topic: Majestic-12 a massive crawler  (Read 1301 times)
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6352
38936 credits
Members referred : 374


It's time to use PHP5!


« on: Sep 11, 2006, 09:27:20 AM »

Hello,

just checked the stats from a website: and found 50.000 pageviews generated withing 6 weeks by this bot:
http://www.majestic12.co.uk/projects/dsearch/mj12bot.php Visit through proxy

thats the same number then visitors on the website...
thank you majestic12...


Last blog : Upload images for usage in TinyMCE
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8124
41701 credits
Members referred : 3



« Reply #1 on: Sep 11, 2006, 10:12:56 AM »

Maybe you need this robots.txt attribute :

Code:
User-Agent: MJ12bot
Crawl-Delay:   5

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : Free Unlimited Bandwith and disk space to good to be true?
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6352
38936 credits
Members referred : 374


It's time to use PHP5!


« Reply #2 on: Sep 11, 2006, 10:48:08 AM »

Maybe you need this robots.txt attribute :

Code:
User-Agent: MJ12bot
Crawl-Delay:   5

I used this one:

Code:
User-Agent: MJ12bot
Disallow:   /

this is a strange world with these bot's, We (website owners have to aks a search engine please spider me) but there are others stealing 100 of MB's from my bandwidth without asking...


Last blog : Upload images for usage in TinyMCE
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8124
41701 credits
Members referred : 3



« Reply #3 on: Sep 11, 2006, 11:13:47 AM »

Hehe that's true.

BTW the Disallow:   / wont help you much.

Maybe you should try Disallow:   *

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : Free Unlimited Bandwith and disk space to good to be true?
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6352
38936 credits
Members referred : 374


It's time to use PHP5!


« Reply #4 on: Sep 11, 2006, 11:17:07 AM »

Hehe that's true.

BTW the Disallow:   / wont help you much.

Maybe you should try Disallow:   *

that's a nice one got this info from their website...


Last blog : Upload images for usage in TinyMCE
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6352
38936 credits
Members referred : 374


It's time to use PHP5!


« Reply #5 on: Nov 27, 2006, 12:05:40 PM »

I used this robots file: http://www.bergtoys.com/robots.txt Visit through proxy

with the result of 25.000 crawls since the last post here in september.

whats wrong with them?


Last blog : Upload images for usage in TinyMCE
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8124
41701 credits
Members referred : 3



« Reply #6 on: Nov 27, 2006, 01:13:57 PM »

You mean what is wrong with your robots.txt?

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : Free Unlimited Bandwith and disk space to good to be true?
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6352
38936 credits
Members referred : 374


It's time to use PHP5!


« Reply #7 on: Nov 27, 2006, 01:32:54 PM »

You mean what is wrong with your robots.txt?
with my file or the bot
(I changed the file today back to use the slash, to check if this is the problem)


Last blog : Upload images for usage in TinyMCE
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8124
41701 credits
Members referred : 3



« Reply #8 on: Nov 27, 2006, 01:50:36 PM »

Use this tool to check your rules :

http://tool.motoricerca.info/robots-checker.phtml Visit through proxy

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : Free Unlimited Bandwith and disk space to good to be true?
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6352
38936 credits
Members referred : 374


It's time to use PHP5!


« Reply #9 on: Nov 27, 2006, 01:54:08 PM »

than is this wrong: Wink

Hehe that's true.

BTW the Disallow:   / wont help you much.

Maybe you should try Disallow:   *


The following block of code contains some errors. Please, remove all the reported errors and check again this robots.txt file.
Line 16    User-agent: ShopWiki
Line 17    Disallow: *
We advise you to start a file/directory name with a leading slash char (Example: /private.html).
Line 18    

The following block of code DISALLOWS the crawling of all files and directories to the following spiders/robots: MJ12bot
Line 19    User-agent: MJ12bot
Line 20    Disallow: /


Last blog : Upload images for usage in TinyMCE
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8124
41701 credits
Members referred : 3



« Reply #10 on: Nov 27, 2006, 02:04:34 PM »

Maybe the correct would be /* ?

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : Free Unlimited Bandwith and disk space to good to be true?
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6352
38936 credits
Members referred : 374


It's time to use PHP5!


« Reply #11 on: Nov 27, 2006, 02:07:55 PM »

Maybe the correct would be /* ?

I don't know this is the information from the tool:


The following block of code DISALLOWS the crawling of all files and directories to the following spiders/robots: MJ12bot


Last blog : Upload images for usage in TinyMCE
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8124
41701 credits
Members referred : 3



« Reply #12 on: Nov 27, 2006, 02:10:54 PM »

Oh right, I thought it was meant to be an error, but it is ok.

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : Free Unlimited Bandwith and disk space to good to be true?
Trackback URI for this entry : http://www.webdigity.com/trackback.php?topic=4031
Tags : directories robots.txt search engines Bookmark this thread : Digg Del.icio.us Dzone more....

Topic sponsors:
Get a permanent link here for $1.99!


Pages: [1] Print 
Webdigity Webmaster Forums  >  Design and Layout  >  General webmaster discussions (Moderator: Meth0d)
Topic: Majestic-12 a massive crawler
« previous next »
Jump to:
User Area
Welcome, Guest. Please login or register.
Did you miss your activation email?
Oct 14, 2008, 02:57:31 AM





Login with username, password and session length

Donate to our community, and get a permanent link back to your site!

Donate to our community, and get a permanent link back to your site!





Readers

Web Design Gallery · Whois Lookup · Pagerank · Tag Browsing · Lo-fi version · Syndication · Webmaster forum history · Advertise
Developed by HumanWorks © 2005 - 2008 Webdigity webmaster community · sublime directory
Webdigity Webmaster Forums | Powered by SMF 1.0.12. © 2001-2005, Lewis Media. All Rights Reserved.