On Tue, 25 Nov 2008, Junio C Hamano wrote: > Trent Piepho <tpiepho@xxxxxxxxxxxxx> writes: >> See: http://www.washington.edu/pine/tech-notes/low-level.html >> >> Entries with a fcc or comment field after the address weren't parsed >> correctly. >> >> Continuation lines, identified by leading spaces, were also not handled. >> >> Distribution lists which had ( ) around a list of addresses did not have >> the parenthesis removed. > >> + pine => sub { my $fh = shift; my $f='\t[^\t]*'; >> + for (my $x = ''; defined($x); $x = $_) { >> + chomp $x; >> + $x .= $1 while(defined($_ = <$fh>) && /^ +(.*)$/); >> + $x =~ /^(\S+)$f\t\(?([^\t]+?)\)?(:?$f){0,2}$/ or next; > > Hmm, so you chomp each continuation line with /^ +(.*)$/ and concatenate > that to the hold buffer ($x) as long as you see continuation lines, > a non-continuation line that you read ahead is given to the next round > (the third part of for(;;) control), checked if you hit an EOF and then > chomped. Which means the complicated regexp about the parentheses is > applied to a logical single line in $x that does not have any newline in > it, right? Yes. The previous regex would just grab the email address with (\S+)$, but that's not right. There can be email address with spaces in them, like "John Doe <jdoe@xxxxxxxx>". And the email address isn't always the last field. So each field has to be put in the regex and \S+ and \s* have to become [^\t]* and \t to count fields properly. That's why the regex got so complex. > I wonder what this does: > > $x .= $1 while (defined($_ = <$fh>) && /^ +(.*)$/); > > when you have "a b" in $x and feed " c\n d\ne\n" to it. When it leaves > the loop, you would have "e\n" in $_ for the next round, and "a bcd" (note > that "bcd" becomes one word) in $x, which I suspect may not be what you > want. The tech docs I linked to just say pine continues lines with leading space, but not how many spaces exactly. From what I can see it appears to usually use three spaces, but sometimes it uses one space when wrapping a very long comment field. It also appears to only split lines between whitespace and non-whitespace. So if "a b c d\n" were to be wrapped, it would be something like "a b \n c \n d\n". If I didn't eat the leading spaces in the continuations, it would be re-assembled as "a b c d". This might cause an address to become "John Doe <jdoe@xxxxxxxx>" -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html