Re: email address syntax checker

Ashley Sheridan <ash@xxxxxxxxxxxxxxxxxxxx> · Sun, 23 Jan 2011 20:31:20 +0000

On Sun, 2011-01-23 at 14:59 -0500, Govinda wrote:

> > Peter Lind wrote:
> > [snip]
> >> if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
> >>     echo "Bad user! Bad user!";
> >> }
> >> 
> >> Regards
> >> Peter
> > 
> > 
> > thanks peter... wish I would have known about filter_var before
> > writing the other checkers. ;-)
> 
> 
> Hi D   :-)
> 
> I was following along.. also felt pleased to be introduced to filter_var ... and then happened to see this:
> 
> http://us3.php.net/manual/en/function.filter-var.php
> the user-contributed note, headed with:
> php dot 5 dot leenoble at SPAMMENOTspamgourmet dot net 18-Dec-2009 10:01
> 
> "Note that FILTER_VALIDATE_EMAIL used in isolation is not enough for most (if not all) web based registration forms.
> 
> It will happily pronounce "yourname" as valid because presumably the "@localhost" is implied, so you still have to check that the domain portion of the address exists.
> "
> 
> So I am surprised Peter recommended it.  (?)
> 
> AFAICT, I should stick with what I was using:
> 
> $emailPattern = '/^[\w\.\-_\+]+@[\w-]+(\.\w{2,4})+$/i'; //
> $emailReplacement = 'theEmailAppearsValid';
> $emailChecker = preg_replace($emailPattern, $emailReplacement, $emailToCheck);
> if($emailChecker == 'theEmailAppearsValid') {
> 	//--theEmailLooksValid, so use it...
> } else {
> 	//--theEmailLooksBad, so do not use it...
> }
> 
> 
> ------------
> Govinda
> 
> 

A few posts back I posted a solution to a question about validating
domain names. You could use that to validate the portion after the last
'@' symbol (as an @ could validly occur in the local part of the email
address) and then validate the front part another way (I can't write all
the code for you ;p )

The domain code I used before was:

<?php
$domain = "www.ashleysheridan.co.uk";
$valid = false;

$tlds = array('aero', 'asia', 'biz', 'cat', 'com', 'coop', 'edu', 'gov',
'info', 'int', 'jobs', 'mil', 'mobi', 'museum', 'name', 'net', 'org',
'pro', 'tel', 'travel', 'xxx', 'ac', 'ad', 'ae', 'af', 'ag', 'ai', 'al',
'am', 'an', 'ao', 'aq', 'ar', 'as', 'at', 'au', 'aw', 'ax', 'az', 'ba',
'bb', 'bd', 'be', 'bf', 'bg', 'bh', 'bi', 'bj', 'bm', 'bn', 'bo', 'br',
'bs', 'bt', 'bv', 'bw', 'by', 'bz', 'ca', 'cc', 'cd', 'cf', 'cg', 'ch',
'ci', 'ck', 'cl', 'cm', 'cn', 'co', 'cr', 'cu', 'cv', 'cx', 'cy', 'cz',
'de', 'dj', 'dk', 'dm', 'do', 'dz', 'ec', 'ee', 'eg', 'er', 'es', 'et',
'eu', 'fi', 'fj', 'fk', 'fm', 'fo', 'fr', 'ga', 'gb', 'gd', 'ge', 'gf',
'gg', 'gh', 'gi', 'gl', 'gm', 'gn', 'gp', 'gq', 'gr', 'gs', 'gt', 'gu',
'gw', 'gy', 'hk', 'hm', 'hn', 'hr', 'ht', 'hu', 'id', 'ie', 'il', 'im',
'in', 'io', 'iq', 'ir', 'is', 'it', 'je', 'jm', 'jo', 'jp', 'ke', 'kg',
'kh', 'ki', 'km', 'kn', 'kp', 'kr', 'kw', 'ky', 'kz', 'la', 'lb', 'lc',
'li', 'lk', 'lr', 'ls', 'lt', 'lu', 'lv', 'ly', 'ma', 'mc', 'md', 'me',
'mg', 'mh', 'mk', 'ml', 'mm', 'mn', 'mo', 'mp', 'mq', 'mr', 'ms', 'mt',
'mu', 'mv', 'mw', 'mx', 'my', 'mz', 'na', 'nc', 'ne', 'nf', 'ng', 'ni',
'nl', 'no', 'np', 'nr', 'nu', 'nz', 'om', 'pa', 'pe', 'pf', 'pg', 'ph',
'pk', 'pl', 'pm', 'pn', 'pr', 'ps', 'pt', 'pw', 'py', 'qa', 're', 'ro',
'rs', 'ru', 'rw', 'sa', 'sb', 'sc', 'sd', 'se', 'sg', 'sh', 'si', 'sj',
'sk', 'sl', 'sm', 'sn', 'so', 'sr', 'st', 'su', 'sv', 'sy', 'sz', 'tc',
'td', 'tf', 'tg', 'th', 'tj', 'tk', 'tl', 'tm', 'tn', 'to', 'tp', 'tr',
'tt', 'tv', 'tw', 'tz', 'ua', 'ug', 'uk', 'us', 'uy', 'uz', 'va', 'vc',
've', 'vg', 'vi', 'vn', 'vu', 'wf', 'ws', 'ye', 'yt', 'za', 'zm',
'zw', );

if(strlen($domain <= 253))
{
	$labels = explode('.', $domain);
	if(in_array($labels[count($labels)-1], $tlds))
	{
		for($i=0; $i<count($labels) -1; $i++)
		{
			if(strlen($labels[$i]) <= 63 && (!preg_match('/^[a-z0-9][a-z0-9
\-]*?[a-z0-9]$/', $labels[$i]) ))
			{
				$valid = false;
				break;	// no point continuing if one label is wrong
			}
			else
			{
				$valid = true;
			}
		}
	}
}

var_dump($valid);

var_dump(filter_var("www.test.co.uk", FILTER_VALIDATE_URL));

It looks like a long chunk, but should validate the domain part
successfully. It doesn't look for the full Unicode-style domains, as
there are only a handful of services that use them, most servers tend to
default to using and displaying punycode instead, so this will still
validate those.

Note: I've removed the check in this snippet that checked for each label
of the domain to not consist of only numbers, as that only applies to
the TLD, I'd misread the spec a little! So this is better to use than
the code from the other thread.

Thanks,
Ash
http://www.ashleysheridan.co.uk