Sublime directory Surf the web anonymous Pagerank Monitor


Crawlable pages and non crawlable pages

Nikolas
Sun 18 September 2005, 12:23 pm GMT +0200
A lot of people ask me how they can have all their pages indexed by Google, and how Google crawler decides to index or not a page.

Well you can find answers to these questions here, but I will also try to explain some things.

The g spider don't like sessions. So if you have a page with a variable like ?sessid=...... it would propably not get indexed. ( eg. url.com/index.php?sessionid=2 )

Also urls with a variable called id plus other variables wont get indexed ( eg. url.com/index.php?id=2&something=1 )

That's all. If you do this, the g bot will crawl the whole of your site.

Adrevel
Sun 18 September 2005, 04:22 pm GMT +0200
:( I have sessions in my page links

for example..
http://www.mozunk.com/index.php?m=browse

How would you change this? My partner who did all the php set it up like that so I will need to tell him.

Nikolas
Sun 18 September 2005, 09:01 pm GMT +0200
Those variables are not session.

I took a look to your site, and it has very well optimized urls.

They can be full indexed, and actually there are allready indexed 15600 pages of your site at Google

ag094
Mon 19 September 2005, 01:20 am GMT +0200
I remember reading in Sitepoint that urls with variables can be indexed- are there any information regarding this?

Dave
Mon 19 September 2005, 02:37 am GMT +0200
Thats some good info, thanks. I have to take notes on all of this stuff for when I create my next site :)

Nikolas
Mon 19 September 2005, 10:38 am GMT +0200
I remember reading in Sitepoint that urls with variables can be indexed- are there any information regarding this?

The most official information on this can be found at the Google guidelines for webmasters

Archive for SMF v1.00 by N.P. Valid XHTML 1.0 Transitional