olaf
Thu 27 July 2006, 11:29 pm GMT +0300
Hello,
I have a table with these two important columns:
"folder" (int) and domain_name (varchar(255).
I accept there will be 1mio records some time and I need to search fast inside the column domain_name for a match like:
SELECT folder FROM table WHERE domain_name = 'www.webdigity.com';
is this a problem? or is it better to use a fulltext search?
Nikolas
Fri 28 July 2006, 02:17 pm GMT +0300
I am not sure for the full text, but if you want to make this query faster you can break the domain to 3 fields (subdomain, domain, tld) and make them index
olaf
Fri 28 July 2006, 02:22 pm GMT +0300
I am not sure for the full text, but if you want to make this query faster you can break the domain to 3 fields (subdomain, domain, tld) and make them index
This is a really good idea thanks
Nikolas
Fri 28 July 2006, 02:37 pm GMT +0300
Welcome ;)
olaf
Fri 28 July 2006, 02:40 pm GMT +0300
dou you think that this way 1-2 mio. records are not a problem?
Nikolas
Fri 28 July 2006, 02:42 pm GMT +0300
I used this type in a database of 2 mil and it was ok.
Of course for big tables using mysql 5 (instead of 4) is allways better :)
olaf
Fri 28 July 2006, 02:50 pm GMT +0300
I see another problem, there are this kind of entries possible:
subdomein.domein.nl
www.domein.nl 
domein.nl
above its possible to split them while using regex
but what with this kind of domain names:
maniche18.blogs.sapo.pt
I saw another URL's with more dots like
www.domein.wh.at.ev.er 
maybe its better to remove the www only?
olaf
Fri 28 July 2006, 02:51 pm GMT +0300
I used this type in a database of 2 mil and it was ok.
Of course for big tables using mysql 5 (instead of 4) is allways better :)
yes that's possible if this host works fine (you know which host I'm talking about)
Nikolas
Fri 28 July 2006, 02:57 pm GMT +0300
If you want to make it better, you should hold a table with all the valid tlds, and associate it with the domain.
This way you will keep the subdomain(s), the domain, and the id of the tld.
This is propably the faster, but needs some work...
olaf
Fri 28 July 2006, 03:17 pm GMT +0300
If you want to make it better, you should hold a table with all the valid tlds, and associate it with the domain.
This way you will keep the subdomain(s), the domain, and the id of the tld.
This is propably the faster, but needs some work...
yes, maybe this is the best solution... what about index the domain name without www?
Nikolas
Fri 28 July 2006, 03:21 pm GMT +0300
That would be better, but with no significant reduce in overhead.
olaf
Fri 28 July 2006, 09:51 pm GMT +0300
I am not sure for the full text, but if you want to make this query faster you can break the domain to 3 fields (subdomain, domain, tld) and make them index
Hey Nick,
I checked this but how do you parse an unkown domain name format like
domain.com
or domain.co.uk
or sub.domain.co.uk
to resolve the three parts?
checking the last part against 280 tld's?
Nikolas
Sat 29 July 2006, 02:07 am GMT +0300
First you get the tld by checkink the last characters of the domain with the tlds table,
Then you remove the tld part and split the domain with the . character, like this:
<?php
$dom = explode ( '.', $domainLeft );
$topDom = array_pop ( $dom );
$restDom = implode ( '.', $dom );
olaf
Sat 29 July 2006, 02:13 am GMT +0300
First you get the tld by checkink the last characters of the domain with the tlds table,
somthing like:
SELECT tld FROM table WHERE '
www.domein.co.uk 
' LIKE %tld
? maybe its too late but I don't get it... splitting all other parts is not the problem...
Nikolas
Sat 29 July 2006, 02:16 am GMT +0300
I guess that will work, but I am not really in a mood to think too :)
Check it out and let me know if it works (or not) so we can solve it tommorow :)
olaf
Sat 29 July 2006, 02:20 am GMT +0300
I guess that will work, but I am not really in a mood to think too :)
Check it out and let me know if it works (or not) so we can solve it tommorow :)
Ok I will go to bed now (to get fresh ideas the next day)
Nikolas
Sat 29 July 2006, 02:23 am GMT +0300
Good night man, see ya tomorow :) (now this thread looks more like a chat zone, lol )
olaf
Sat 29 July 2006, 12:23 pm GMT +0300
I played a little bit more with this domain split.
I think there are more poblems:
example:
www.domain.co.uk 
there are two mathches in my tld table: co.uk and uk
another problem is the sub domain because this is optional, how does the script knows that?
(by the way the select statemant doesn't work) ;)
Nikolas
Sat 29 July 2006, 12:44 pm GMT +0300
I will try to make this with pseudo - php code :
<?php
$domain = 'whatever.com';
$sql->query ( "SELECT tld FROM tlds ORDER BY LENGTH DESC" );//Meaning that it will give first the co.uk and then the uk
while ( $sql->next() )
{
if ( substr ( $domain, -1, strlen( $sql['tld'] ) = $sql['tld'] )
{
//You found the tld
$domainLeft = substr ( $domain, 0, ( strlen($sql['tld'] - strlen($domain) + 1 )); // It will left the whatever, without the extension and the .
if ( strpos ( $domainLeft, '.' ) === false ) //It has subdomain
{
$subDomain = '';
$finalDomain = $domainLeft;
}else{
$subDomain = explode ( '.', $domainLeft );
$finalDomain = array_pop ( $subDomain );
$subDomain = implode ( '.', $subDomain );
}
}
}
I am not sure if it is working, but the logic is correct (or I have to drink another coffee to think it better :) )
olaf
Sat 29 July 2006, 02:50 pm GMT +0300
Hey Nick,
Thank you that brings back the development time for this feature to only a few hours ;)
after some modification, this is the result:
<?php
$domain = "mail.finalwebsites.co.uk";
$res = mysql_query("SELECT tld FROM domain_info ORDER BY LENGTH(tld) DESC"); //Meaning that it will give first the co.uk and then the uk
while ($arr = mysql_fetch_assoc($res)) {
$tmp_tld = substr($domain, -strlen($arr['tld']));
if ($tmp_tld == $arr['tld']) { // You found the tld
$tld = $arr['tld'];
$domainLeft = substr($domain, 0, -(strlen($tld) + 1)); // It will left the whatever, without the extension and the .
if (strpos($domainLeft, ".") === false) { // It hasn't a subdomain
$subDomain = "";
$finalDomain = $domainLeft;
} else {
$domain_parts = explode(".", $domainLeft);
$finalDomain = array_pop($domain_parts); // select the domain and remove it from the array
$subDomain = implode(".", $domain_parts); // a subdomain can more then one parts seperated with dot's
}
echo "The tld is: ".$tld."<br>";
echo "the domain name is :".$finalDomain."<br>";
echo "the subdomain is: ";
echo (!empty($subDomain)) ? $subDomain : "n/a";
echo "<br>";
break;
}
}
by the way this function is a rare one (searched already the internet for a while). I like to post this script to the tutorial section, what about the tld table? Maybe we should send this data on request? (to members only)
Nikolas
Sat 29 July 2006, 02:58 pm GMT +0300
Sure you can post it :)
The tutorials area is only viewable by the members anyway ;)
olaf
Sat 29 July 2006, 11:42 pm GMT +0300
Now I have splitted the domain in the columns:
sub_domain
domain
tld
do I have to set indexes for all three columns?
Nikolas
Sun 30 July 2006, 04:15 pm GMT +0300
You need a primary key for all of those (check them all together and add primary key from phpMyAdmin)
olaf
Sun 30 July 2006, 04:20 pm GMT +0300
got an error because the entries are not unique sinds I have records for different image sizes for the same domain name... do I have to change that?
Nikolas
Sun 30 July 2006, 04:27 pm GMT +0300
I don't got that. You keep for each domain one photo?
If you do then all 4 fields must be kept as a primary key.
olaf
Sun 30 July 2006, 04:31 pm GMT +0300
I don't got that. You keep for each domain one photo?
If you do then all 4 fields must be kept as a primary key.
yes I save max 4 thumbs for each domain (Chris is using two formats)
Now I understand how this works with the primary index, thanks (these 4 columns are the values I use to select a thumbnail)
Nikolas
Sun 30 July 2006, 04:33 pm GMT +0300
these 4 columns are the values I use to select a thumbnail
That's the point. You got it :)