On Wed, Sep 07, 2005 at 01:33:51PM -0700, Randal L. Schwartz wrote: > >>>>> "Steve" == Steve Atkins <steve@xxxxxxxxxxx> writes: > > Steve> But, depending on what you're doing, validation may not be a good > Steve> idea. There are email addresses that are syntactically invalid that > Steve> are deliverable and in active use. > > Really? Name one. Or maybe it's just your idea of syntax that's wrong. Well, my idea of syntax may differ from yours, but it doesn't neccessarily mean that either of us is wrong. If we were talking the formal grammar in RFC2822 section 3.4.1 I'd agree with you. But reading the surrounding text implies that the spec is tighter than the formal grammar says it is. 2822 syntax allows almost any character in the domain-part (excluding brackets, whitespace and backslash only, IIRC) but 2822 also describes the dot-atom form of the domain part as an internet domain name, either an MX or a hostname, referring to STD3, STD13 and STD14. While most characters are legal in the 2822 syntax and in DNS, you can extract from the RFCs that hostnames really should look like /([A-Za-z0-9-]+\.)*[A-Za-z0-9]+/ So I consider any use of characters outside that set in a hostname or "domain name" to be invalid. Specifically an underscore is not a valid character, so any use of an underscore in the domain-part of an address that is supposedly an internet address is syntactically invalid. And yet there are quite a lot of hosts that have underscores in their names. Mail to them is deliverable. I've seen them in use occasionally, though I've no idea how reliable they are. All of which is a nice bit of RFC-lawyering, but not really that relevant. The obvious response demonstrating that "steve@foo&bar+baz" is syntactically valid would be an equally good bit of RFC-lawyering too. :) More practically (and this is a pragmatic database list, not an esoteric rules-lawyering anti-spam list :) ) I've found that the RE I mentioned earlier - allowing underscore, but excluding the other invalid hostname characters - is pretty good at spotting the usual badly formatted email addresses you see, without stumbling over the ones that many "email address validators" do. It punts on the whole "what is a reasonable looking local part?" question, of course, but that's near impossible to answer in a useful, practical sense other than being nervous about whitespace or anything smacking of source routing. Cheers, Steve ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster