Re: Parse domain from URL

"Daniel Brown" <parasane@xxxxxxxxx> · Thu, 7 Jun 2007 11:00:43 -0400

On 6/7/07, Robin Vickery <robinv@xxxxxxxxx> wrote:
On 06/06/07, Brad Fuller <bfuller@xxxxxxxxxxxxxxxx> wrote:
> Daniel Brown wrote:
> > On 6/6/07, Brad Fuller <bfuller@xxxxxxxxxxxxxxxx> wrote:
> >>
> >> I need to strip out a domain name from a URL, and ignore subdomains
> >> (like www)
> >>
> >> I can use parse_url to get the hostname. And my first thought was to
> >> take the last 2 segments of the hostname to get the domain.
> >  So if the
> >> URL is http://www.example.com/
> >> Then the domain is "example.com."   If the URL is
> >> http://example.org/ then the domain is "example.org."
> >>
> >> This seemed to work perfectly until I come across a URL like
> >> http://www.example.co.uk/ My script thinks the domain is "co.uk."
> >>
> >> So I added a bit of code to account for this, basically if the 2nd to
> >> last segment of the hostname is "co" then take the last 3 segments.
> >>
> >> Then I stumbled across a URL like http://www.example.com.au/
> >>
> >> So it occurred to me that this is not the best solution, unless I
> >> have a definitive list of all exceptions to go off of.
> >>
> >> Does anyone have any suggestions?
> >>
> >> Any advice is much appreciated.
> >
> >     Well, it's not very clean, but if you just need to remove
> > the subdomain/CNAME from the domain....
> >
> > <?
> > $hostname = parse_url($_SERVER['SERVER_NAME']);
> > $domsplit = explode('.',$hostname['path']);
> > for($i=1;$i<count($domsplit);$i++) {
> >         $i == (count($domsplit) - 1) ? $domain .= $domsplit[$i] :
> > $domain .= $domsplit[$i]."."; }
> > echo $domain;
> >>
> >
> >     There's probably a much better way to do it, but in the
> > interest of a quick response, that's one way.
>
>
> Yes, that's basically what my code already does.
>
> The problem is that what if the url is "http://yahoo.co.uk/"; (note the lack
> of a subdomain)
>
> Your script thinks that the domain is "co.uk".  Just like my existing code
> does.
>
> So we can't count on taking the last 2 segments.  And we can't count on
> ignoring the first segment.  (The subdomain could be anything, not just www)

In that case you can't do it just by parsing alone, you need to use DNS.

<?php
function get_domain ($hostname) {
  dns_get_record($hostname, DNS_A, $authns, $addt);
  return $authns[0]['host'];
}

print get_domain("www.google.com") . "\n";
print get_domain("google.com") . "\n";
print get_domain("www.google.co.uk") . "\n";
print get_domain("google.co.uk") . "\n";
print get_domain("google.co.uk") . "\n";
print get_domain("google.com.au") . "\n";
print get_domain("www.google.com.au") . "\n";

/* result
google.com
google.com
google.co.uk
google.co.uk
google.co.uk
google.com.au
google.com.au
*/
?>

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

   Wow.... great job, Robin.... I didn't even know about the
dns_get_record() function myself until just now.  I can actually think
of a few places to use that now.... email validation, for one.

--
Daniel P. Brown
[office] (570-) 587-7080 Ext. 272
[mobile] (570-) 766-8107

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php