On Sat, 2011-01-08 at 16:55 +0800, WalkinRaven wrote:
PHP 5.3 PCRE
Regular Express to match domain names format according to RFC 1034 -
DOMAIN NAMES - CONCEPTS AND FACILITIES
/^
(
[a-z] |
[a-z] (?:[a-z]|[0-9]) |
[a-z] (?:[a-z]|[0-9]|\-){1,61} (?:[a-z]|[0-9]) ) # One label
(?:\.(?1))*+ # More labels
\.? # Root domain name
$/iDx
This rule matches only<label> and<label>. but not<label>.<label>...
I don't know what wrong with it.
Thank you.
I think trying to do all of this in one regex will prove more trouble
than it's worth. Maybe breaking it down into something like this:
<?php
$domain = "www.ashleysheridan.co.uk";
$valid = false;
$tlds = array('aero', 'asia', 'biz', 'cat', 'com', 'coop', 'edu', 'gov',
'info', 'int', 'jobs', 'mil', 'mobi', 'museum', 'name', 'net', 'org',
'pro', 'tel', 'travel', 'xxx', 'ac', 'ad', 'ae', 'af', 'ag', 'ai', 'al',
'am', 'an', 'ao', 'aq', 'ar', 'as', 'at', 'au', 'aw', 'ax', 'az', 'ba',
'bb', 'bd', 'be', 'bf', 'bg', 'bh', 'bi', 'bj', 'bm', 'bn', 'bo', 'br',
'bs', 'bt', 'bv', 'bw', 'by', 'bz', 'ca', 'cc', 'cd', 'cf', 'cg', 'ch',
'ci', 'ck', 'cl', 'cm', 'cn', 'co', 'cr', 'cu', 'cv', 'cx', 'cy', 'cz',
'de', 'dj', 'dk', 'dm', 'do', 'dz', 'ec', 'ee', 'eg', 'er', 'es', 'et',
'eu', 'fi', 'fj', 'fk', 'fm', 'fo', 'fr', 'ga', 'gb', 'gd', 'ge', 'gf',
'gg', 'gh', 'gi', 'gl', 'gm', 'gn', 'gp', 'gq', 'gr', 'gs', 'gt', 'gu',
'gw', 'gy', 'hk', 'hm', 'hn', 'hr', 'ht', 'hu', 'id', 'ie', 'il', 'im',
'in', 'io', 'iq', 'ir', 'is', 'it', 'je', 'jm', 'jo', 'jp', 'ke', 'kg',
'kh', 'ki', 'km', 'kn', 'kp', 'kr', 'kw', 'ky', 'kz', 'la', 'lb', 'lc',
'li', 'lk', 'lr', 'ls', 'lt', 'lu', 'lv', 'ly', 'ma', 'mc', 'md', 'me',
'mg', 'mh', 'mk', 'ml', 'mm', 'mn', 'mo', 'mp', 'mq', 'mr', 'ms', 'mt',
'mu', 'mv', 'mw', 'mx', 'my', 'mz', 'na', 'nc', 'ne', 'nf', 'ng', 'ni',
'nl', 'no', 'np', 'nr', 'nu', 'nz', 'om', 'pa', 'pe', 'pf', 'pg', 'ph',
'pk', 'pl', 'pm', 'pn', 'pr', 'ps', 'pt', 'pw', 'py', 'qa', 're', 'ro',
'rs', 'ru', 'rw', 'sa', 'sb', 'sc', 'sd', 'se', 'sg', 'sh', 'si', 'sj',
'sk', 'sl', 'sm', 'sn', 'so', 'sr', 'st', 'su', 'sv', 'sy', 'sz', 'tc',
'td', 'tf', 'tg', 'th', 'tj', 'tk', 'tl', 'tm', 'tn', 'to', 'tp', 'tr',
'tt', 'tv', 'tw', 'tz', 'ua', 'ug', 'uk', 'us', 'uy', 'uz', 'va', 'vc',
've', 'vg', 'vi', 'vn', 'vu', 'wf', 'ws', 'ye', 'yt', 'za', 'zm',
'zw', );
if(strlen($domain<= 253))
{
$labels = explode('.', $domain);
if(in_array($labels[count($labels)-1], $tlds))
{
for($i=0; $i<count($labels) -1; $i++)
{
if(strlen($labels[$i])<= 63&& (!preg_match('/^[a-z0-9][a-z0-9
\-]*?[a-z0-9]$/', $labels[$i]) || preg_match('/^[0-9]+$/',
$labels[$i]) ))
{
$valid = false;
break; // no point continuing if one label is wrong
}
else
{
$valid = true;
}
}
}
}
var_dump($valid);
This matches the last label with a TLD, and each label thereafter
against the standard a-z0-9 and hyphen rule as indicated in the
preferred characters allowed in a label (LDH rule), with the start and
end character in a label isn't a hyphen (oddly enough it doesn't mention
starting with a digit!)
Also, each label is checked to ensure it doesn't run over 63 characters,
and the whole thing isn't over 253 characters. Lastly, each label is
checked to ensure it doesn't completely consist of digits.
I've tested it only with my domain so far, but it should work fairly
well. As I said before, I couldn't think of a way to do it all with one
regex. It could probably be done, but would you really want to create a
huge and difficult to read/understand expression just because it's
possible?