Re: Help: Validate Domain Name by Regular Express

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Right, RFC 1034 allow valid endless . parts, till the sum length is over 255.

On 01/09/2011 01:21 AM, TR Shaw wrote:
On Jan 8, 2011, at 12:09 PM, Ashley Sheridan wrote:

On Sat, 2011-01-08 at 16:55 +0800, WalkinRaven wrote:

PHP 5.3 PCRE

Regular Express to match domain names format according to RFC 1034 -
DOMAIN NAMES - CONCEPTS AND FACILITIES

/^
(
   [a-z]                 |
   [a-z] (?:[a-z]|[0-9]) |
   [a-z] (?:[a-z]|[0-9]|\-){1,61} (?:[a-z]|[0-9])			) # One label

(?:\.(?1))*+        # More labels
\.?                 # Root domain name
$/iDx

This rule matches only<label>  and<label>. but not<label>.<label>...

I don't know what wrong with it.

Thank you.



I think trying to do all of this in one regex will prove more trouble
than it's worth. Maybe breaking it down into something like this:

<?php
$domain = "www.ashleysheridan.co.uk";
$valid = false;

$tlds = array('aero', 'asia', 'biz', 'cat', 'com', 'coop', 'edu', 'gov',
'info', 'int', 'jobs', 'mil', 'mobi', 'museum', 'name', 'net', 'org',
'pro', 'tel', 'travel', 'xxx', 'ac', 'ad', 'ae', 'af', 'ag', 'ai', 'al',
'am', 'an', 'ao', 'aq', 'ar', 'as', 'at', 'au', 'aw', 'ax', 'az', 'ba',
'bb', 'bd', 'be', 'bf', 'bg', 'bh', 'bi', 'bj', 'bm', 'bn', 'bo', 'br',
'bs', 'bt', 'bv', 'bw', 'by', 'bz', 'ca', 'cc', 'cd', 'cf', 'cg', 'ch',
'ci', 'ck', 'cl', 'cm', 'cn', 'co', 'cr', 'cu', 'cv', 'cx', 'cy', 'cz',
'de', 'dj', 'dk', 'dm', 'do', 'dz', 'ec', 'ee', 'eg', 'er', 'es', 'et',
'eu', 'fi', 'fj', 'fk', 'fm', 'fo', 'fr', 'ga', 'gb', 'gd', 'ge', 'gf',
'gg', 'gh', 'gi', 'gl', 'gm', 'gn', 'gp', 'gq', 'gr', 'gs', 'gt', 'gu',
'gw', 'gy', 'hk', 'hm', 'hn', 'hr', 'ht', 'hu', 'id', 'ie', 'il', 'im',
'in', 'io', 'iq', 'ir', 'is', 'it', 'je', 'jm', 'jo', 'jp', 'ke', 'kg',
'kh', 'ki', 'km', 'kn', 'kp', 'kr', 'kw', 'ky', 'kz', 'la', 'lb', 'lc',
'li', 'lk', 'lr', 'ls', 'lt', 'lu', 'lv', 'ly', 'ma', 'mc', 'md', 'me',
'mg', 'mh', 'mk', 'ml', 'mm', 'mn', 'mo', 'mp', 'mq', 'mr', 'ms', 'mt',
'mu', 'mv', 'mw', 'mx', 'my', 'mz', 'na', 'nc', 'ne', 'nf', 'ng', 'ni',
'nl', 'no', 'np', 'nr', 'nu', 'nz', 'om', 'pa', 'pe', 'pf', 'pg', 'ph',
'pk', 'pl', 'pm', 'pn', 'pr', 'ps', 'pt', 'pw', 'py', 'qa', 're', 'ro',
'rs', 'ru', 'rw', 'sa', 'sb', 'sc', 'sd', 'se', 'sg', 'sh', 'si', 'sj',
'sk', 'sl', 'sm', 'sn', 'so', 'sr', 'st', 'su', 'sv', 'sy', 'sz', 'tc',
'td', 'tf', 'tg', 'th', 'tj', 'tk', 'tl', 'tm', 'tn', 'to', 'tp', 'tr',
'tt', 'tv', 'tw', 'tz', 'ua', 'ug', 'uk', 'us', 'uy', 'uz', 'va', 'vc',
've', 'vg', 'vi', 'vn', 'vu', 'wf', 'ws', 'ye', 'yt', 'za', 'zm',
'zw', );


if(strlen($domain<= 253))
{
	$labels = explode('.', $domain);
	if(in_array($labels[count($labels)-1], $tlds))
	{
		for($i=0; $i<count($labels) -1; $i++)
		{
			if(strlen($labels[$i])<= 63&&  (!preg_match('/^[a-z0-9][a-z0-9
\-]*?[a-z0-9]$/', $labels[$i]) || preg_match('/^[0-9]+$/',
$labels[$i]) ))
			{
				$valid = false;
				break;	// no point continuing if one label is wrong
			}
			else
			{
				$valid = true;
			}
		}
	}
}

var_dump($valid);


This matches the last label with a TLD, and each label thereafter
against the standard a-z0-9 and hyphen rule as indicated in the
preferred characters allowed in a label (LDH rule), with the start and
end character in a label isn't a hyphen (oddly enough it doesn't mention
starting with a digit!)

Also, each label is checked to ensure it doesn't run over 63 characters,
and the whole thing isn't over 253 characters. Lastly, each label is
checked to ensure it doesn't completely consist of digits.

I've tested it only with my domain so far, but it should work fairly
well. As I said before, I couldn't think of a way to do it all with one
regex. It could probably be done, but would you really want to create a
huge and difficult to read/understand expression just because it's
possible?
Ash

I doubt its possible since the ccTLD's have valid 3 and more dotted domain names. You should see .us And .uk doesn't follow the ccTLS rules for .tk for example.

Now, if the purpose is to write a regex for a host name then that's a different story.

Tom

--
Me at:
http://WalkinRaven.name


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux