24, July 2008

Majestic-12 a massive crawler - webmaster forum

 
Webdigity webmaster forums
This forum shares its ad revenue with its members!
[ Home | Help | Search | Forum's Shop | Archive | Login | Register | Webmaster Directory ]
Webdigity Webmaster Forums  >  Design and Layout  >  General webmaster discussions (Moderator: Meth0d)
Topic: Majestic-12 a massive crawler
« previous next »
Pages: [1] Print

Author Topic: Majestic-12 a massive crawler  (Read 1226 times)
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6280
38506 credits
Members referred : 374


It's time to use PHP5!


« on: Sep 11, 2006, 09:27:20 AM »

Hello,

just checked the stats from a website: and found 50.000 pageviews generated withing 6 weeks by this bot:
http://www.majestic12.co.uk/projects/dsearch/mj12bot.php Visit through proxy

thats the same number then visitors on the website...
thank you majestic12...


Last blog : 4th of July Lottery from TemplateMonster.com
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 7975
40807 credits
Members referred : 3



« Reply #1 on: Sep 11, 2006, 10:12:56 AM »

Maybe you need this robots.txt attribute :

Code:
User-Agent: MJ12bot
Crawl-Delay:   5

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : MIA - Where Nick and Tim
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6280
38506 credits
Members referred : 374


It's time to use PHP5!


« Reply #2 on: Sep 11, 2006, 10:48:08 AM »

Maybe you need this robots.txt attribute :

Code:
User-Agent: MJ12bot
Crawl-Delay:   5

I used this one:

Code:
User-Agent: MJ12bot
Disallow:   /

this is a strange world with these bot's, We (website owners have to aks a search engine please spider me) but there are others stealing 100 of MB's from my bandwidth without asking...


Last blog : 4th of July Lottery from TemplateMonster.com
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 7975
40807 credits
Members referred : 3



« Reply #3 on: Sep 11, 2006, 11:13:47 AM »

Hehe that's true.

BTW the Disallow:   / wont help you much.

Maybe you should try Disallow:   *

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : MIA - Where Nick and Tim
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6280
38506 credits
Members referred : 374


It's time to use PHP5!


« Reply #4 on: Sep 11, 2006, 11:17:07 AM »

Hehe that's true.

BTW the Disallow:   / wont help you much.

Maybe you should try Disallow:   *

that's a nice one got this info from their website...


Last blog : 4th of July Lottery from TemplateMonster.com
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6280
38506 credits
Members referred : 374


It's time to use PHP5!


« Reply #5 on: Nov 27, 2006, 12:05:40 PM »

I used this robots file: http://www.bergtoys.com/robots.txt Visit through proxy

with the result of 25.000 crawls since the last post here in september.

whats wrong with them?


Last blog : 4th of July Lottery from TemplateMonster.com
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 7975
40807 credits
Members referred : 3



« Reply #6 on: Nov 27, 2006, 01:13:57 PM »

You mean what is wrong with your robots.txt?

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : MIA - Where Nick and Tim
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6280
38506 credits
Members referred : 374


It's time to use PHP5!


« Reply #7 on: Nov 27, 2006, 01:32:54 PM »

You mean what is wrong with your robots.txt?
with my file or the bot
(I changed the file today back to use the slash, to check if this is the problem)


Last blog : 4th of July Lottery from TemplateMonster.com
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 7975
40807 credits
Members referred : 3



« Reply #8 on: Nov 27, 2006, 01:50:36 PM »

Use this tool to check your rules :

http://tool.motoricerca.info/robots-checker.phtml Visit through proxy

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : MIA - Where Nick and Tim
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6280
38506 credits
Members referred : 374


It's time to use PHP5!


« Reply #9 on: Nov 27, 2006, 01:54:08 PM »

than is this wrong: Wink

Hehe that's true.

BTW the Disallow:   / wont help you much.

Maybe you should try Disallow:   *


The following block of code contains some errors. Please, remove all the reported errors and check again this robots.txt file.
Line 16    User-agent: ShopWiki
Line 17    Disallow: *
We advise you to start a file/directory name with a leading slash char (Example: /private.html).
Line 18    

The following block of code DISALLOWS the crawling of all files and directories to the following spiders/robots: MJ12bot
Line 19    User-agent: MJ12bot
Line 20    Disallow: /


Last blog : 4th of July Lottery from TemplateMonster.com
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 7975
40807 credits
Members referred : 3



« Reply #10 on: Nov 27, 2006, 02:04:34 PM »

Maybe the correct would be /* ?

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : MIA - Where Nick and Tim
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6280
38506 credits
Members referred : 374


It's time to use PHP5!


« Reply #11 on: Nov 27, 2006, 02:07:55 PM »

Maybe the correct would be /* ?

I don't know this is the information from the tool:


The following block of code DISALLOWS the crawling of all files and directories to the following spiders/robots: MJ12bot


Last blog : 4th of July Lottery from TemplateMonster.com
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 7975
40807 credits
Members referred : 3



« Reply #12 on: Nov 27, 2006, 02:10:54 PM »

Oh right, I thought it was meant to be an error, but it is ok.

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy

Last blog : MIA - Where Nick and Tim
Trackback URI for this entry : http://www.webdigity.com/trackback.php?topic=4031
Tags : directories robots.txt search engines Bookmark this thread : Digg Del.icio.us Dzone more....

Topic sponsors:
Get a permanent link here for $1.99!


Pages: [1] Print 
Webdigity Webmaster Forums  >  Design and Layout  >  General webmaster discussions (Moderator: Meth0d)
Topic: Majestic-12 a massive crawler
« previous next »
Jump to:
User Area
Welcome, Guest. Please login or register.
Did you miss your activation email?
Jul 24, 2008, 11:22:10 PM





Login with username, password and session length

Donate to our community, and get a permanent link back to your site!

Donate to our community, and get a permanent link back to your site!


Forum Statistics
Total Posts: 35.717
Total Topics: 7.379
Total Members: 3.710
Tutorials : 56
Resources : 143
Designs : 220
Latest Member: prolist

38 Guests, 4 Users online :

11 users online today:



Readers

Web Design Gallery · Whois Lookup · Pagerank · Tag Browsing · Lo-fi version · Syndication · Webmaster forum history · Advertise
Developed by HumanWorks © 2005 - 2008 Webdigity webmaster community · sublime directory
Webdigity Webmaster Forums | Powered by SMF 1.0.12. © 2001-2005, Lewis Media. All Rights Reserved.