On Wed, Jun 8, 2016 at 1:58 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Samuel GROOT <samuel.groot@xxxxxxxxxxxxxxxx> writes: >> +sub parse_email { >> + my %mail = (); >> + my $fh = shift; >> + my $last_header; > >> + # Unfold and parse multiline header fields >> + while (<$fh>) { >> + last if /^\s*$/; > > You stop at the end of fields before the message body starts. > >> + s/\r\n|\n|\r//; > > The pattern is not anchored at the right end of the string; > intended? Is it worth worrying about a lone '\r'? Thanks, I think you covered pretty much everything I was going to say. I'd just add that if the matching is going to be kept loose like this (rather than anchoring it), then s/[\r\n]+//g might be easier to read, but it's a minor point. >> + if (/^([^\s:]+):[\s]+(.*)$/) { >> + $last_header = lc($1); >> + @{$mail{$last_header}} = () >> + unless defined $mail{$last_header}; >> + push @{$mail{$last_header}}, $2; > >> + } elsif (/^\s+\S/ and defined $last_header) { >> + s/^\s+/ /; >> + push @{$mail{$last_header}}, $_; > > Even though the comment said "unfold", you do not really do the > unfolding here and the caller can (if it wants to) figure out where > one logical header was folded in the original into multiple physical > lines, because you are returning an arrayref. Also, the comment about folding lines should be moved down the part of the code which is actually (supposed to be) doing the folding rather than having the comment at the top of the loop. > However, that means the caller still cannot tell what the original > was if you are given: > > X-header: a b > c > X-header: d > > as you would return { 'X-header' => ["a b", "c", "d")] } > > In that sense, it may be better to do a real unfolding here, so that > it would return { 'X-header' => ["a b c", "d"] } from here instead? > > I.e. instead of "push @{...}, $_", append $_ to the last element of > that array? Right. >> + # Separate body from header >> + $mail{"body"} = [(<$fh>)]; >> + >> + return \%mail; > > The name of the local thing is not observable from the caller, but > because this is "parse-email-header" and returns "header fields" > without reading the "mail", perhaps call it %header instead? If there is (for some reason) a mail header named 'body', then this assignment of the body portion of the message will overwrite it. Perhaps this function should instead return multiple values: the header hash, and the message body. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html