On 10/24/05, Manuel Lemos <mlemos@xxxxxxx> wrote: > on 10/23/2005 07:21 PM Robin Vickery said the following: > >> > >>> ... would it not make sense for there to be a BUILT-IN PHP function of > >>> a TRUE email syntactic validation? > >> I don't see that being much better than passing a good regular > >> expression to preg_match. > > > > 1. Technically you can't write a regular expression that matches > > *all* valid email addresses as part of the address specification is > > recursive. > > > > ccontent = ctext / quoted-pair / comment > > comment = "(" *([FWS] ccontent) [FWS] ")" > > > > Admittedly 99.99% of people don't even know you *can* comment email > > addresses so it's not a huge problem... > > If I am not mistaken, PCRE supports recursive regular expressions. I'm afraid not. You can hack recursion in Perl with the (??{ }) postponed expression construct. But PCRE doesn't support it. Without recursion, the best you can do is decide on a reasonable depth of nested comments and hardcode that. > Anyway, the way I got the RFC that is not quite the form of an address > but the way it may be presented in message header. Meaning, you can add > comments in To: or other e-mail header but in reality the comments are > not part of the address. I'm not sure exactly what you mean here. It's true that comments don't affect how mail gets delivered, but they're very definitely part of the address and may well have a meaning to the recipient that you can't predict. They could be using it for anti-spam or to disinguish between users of a mailbox or... well, anything really. Which is the reason that RFC-2821 recommends that they be passed to the recipient unchanged. > > 2. Very few people seem to be capable of recognising a *good* regular > > expression, let alone writing one. It seems clear that validating an > > address is a task that many people want to do, but few can do > > properly. I'd say that's a good reason for making it a built-in > > function. > > Yes. What I meant is that just copying a good enough regular expression > would be sufficient to use it. There is no need to understand it. I had a quick look through my email last night and found 14 different email validation regular expressions posted to this list in the last few months. All of them would falsely reject valid addresses even without taking comments into account. 6 of them wouldn't even allow "judy.o'grady@xxxxxxxxxxx" and another 3 would reject mail from the entire .museum TLD. What that would indicate to me, is that many people can't even recognise what a "good enough" regular expression looks like. > What I meant is that despite I use that regular expression for many > years without complaints, it could be improved to reject only invalid > characters, but of course that is not what that expression does. Possibly because those whose email addresses it rejected couldn't contact you to complain? :-) Actually, I have very little problem with your regexp - I'd like it to handle domain literals, as they can be useful in communicating with people with broken DNS. But that's about it. -robin -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php