Le Saturday 01 November 2008 18:43:52 Pierre Habouzit, vous avez écrit : [...] > Your regex fails to parse: > > "Someone with a comma, and an escape double quote \" in its name" Easy fix: replace "[^"]+" with "[^"]+(?:\\"[^"]*)*". > <regex.cant.be.used.for.serious.parsing@xxxxxxxxxxxxx> Oh yes. Regexes _are_ the way to do serious parsing. All MIME packages you will find floating around use regexes to parse mail headers correctly. Granted, adhering to the RFC822 to the letter is rather hard. But I have a sample program here that can not only parse the escaped double quote, but also take account for the multiple line stuff and multiple headers of the same type where email addresse are valid (To:, Cc:, Bcc:). See attachment. Feel free to use the code. ---- fg@erwin ~ $ cat t.txt To: John Doe <some.address@xxxxxxxx>, Random Joe <random.joe@xxxxxxx>, Superman <batman@xxxxxx>, "Someone with a comma, inside its tag name" <a@xxxxx> To: bbr@xxxxxxxxxxxx, u1@xxxxxxxxxxxx, u2@xxxxxxxxxxx, u3@xxxxxxxx fg@erwin ~ $ perl t.pl <t.txt Found mail: John Doe <some.address@xxxxxxxx> Found mail: Random Joe <random.joe@xxxxxxx> Found mail: Superman <batman@xxxxxx> Found mail: "Someone with a comma, inside its tag name" <a@xxxxx> Found mail: bbr@xxxxxxxxxxxx Found mail: u1@xxxxxxxxxxxx Found mail: u2@xxxxxxxxxxx Found mail: u3@xxxxxxxx ---- -- fge
Attachment:
t.pl
Description: Perl program