Re: Parse domain from URL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/06/07, Brad Fuller <bfuller@xxxxxxxxxxxxxxxx> wrote:
Daniel Brown wrote:
> On 6/6/07, Brad Fuller <bfuller@xxxxxxxxxxxxxxxx> wrote:
>>
>> I need to strip out a domain name from a URL, and ignore subdomains
>> (like www)
>>
>> I can use parse_url to get the hostname. And my first thought was to
>> take the last 2 segments of the hostname to get the domain.
>  So if the
>> URL is http://www.example.com/
>> Then the domain is "example.com."   If the URL is
>> http://example.org/ then the domain is "example.org."
>>
>> This seemed to work perfectly until I come across a URL like
>> http://www.example.co.uk/ My script thinks the domain is "co.uk."
>>
>> So I added a bit of code to account for this, basically if the 2nd to
>> last segment of the hostname is "co" then take the last 3 segments.
>>
>> Then I stumbled across a URL like http://www.example.com.au/
>>
>> So it occurred to me that this is not the best solution, unless I
>> have a definitive list of all exceptions to go off of.
>>
>> Does anyone have any suggestions?
>>
>> Any advice is much appreciated.
>
>     Well, it's not very clean, but if you just need to remove
> the subdomain/CNAME from the domain....
>
> <?
> $hostname = parse_url($_SERVER['SERVER_NAME']);
> $domsplit = explode('.',$hostname['path']);
> for($i=1;$i<count($domsplit);$i++) {
>         $i == (count($domsplit) - 1) ? $domain .= $domsplit[$i] :
> $domain .= $domsplit[$i]."."; }
> echo $domain;
>>
>
>     There's probably a much better way to do it, but in the
> interest of a quick response, that's one way.


Yes, that's basically what my code already does.

The problem is that what if the url is "http://yahoo.co.uk/"; (note the lack
of a subdomain)

Your script thinks that the domain is "co.uk".  Just like my existing code
does.

So we can't count on taking the last 2 segments.  And we can't count on
ignoring the first segment.  (The subdomain could be anything, not just www)

In that case you can't do it just by parsing alone, you need to use DNS.

<?php
function get_domain ($hostname) {
 dns_get_record($hostname, DNS_A, $authns, $addt);
 return $authns[0]['host'];
}

print get_domain("www.google.com") . "\n";
print get_domain("google.com") . "\n";
print get_domain("www.google.co.uk") . "\n";
print get_domain("google.co.uk") . "\n";
print get_domain("google.co.uk") . "\n";
print get_domain("google.com.au") . "\n";
print get_domain("www.google.com.au") . "\n";

/* result
google.com
google.com
google.co.uk
google.co.uk
google.co.uk
google.com.au
google.com.au
*/
?>

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux