On 06/08/2016 07:58 PM, Junio C Hamano wrote:
Samuel GROOT <samuel.groot@xxxxxxxxxxxxxxxx> writes:
+sub parse_email {
+ my %mail = ();
+ my $fh = shift;
+ my $last_header;
+ # Unfold and parse multiline header fields
+ while (<$fh>) {
+ last if /^\s*$/;
You stop at the end of fields before the message body starts.
+ s/\r\n|\n|\r//;
The pattern is not anchored at the right end of the string;
intended? Is it worth worrying about a lone '\r'?
A lone '\r' shouldn't happen, but we are never too careful. It's fixed
with what Eric suggested.
+ if (/^([^\s:]+):[\s]+(.*)$/) {
+ $last_header = lc($1);
+ @{$mail{$last_header}} = ()
+ unless defined $mail{$last_header};
+ push @{$mail{$last_header}}, $2;
+ } elsif (/^\s+\S/ and defined $last_header) {
+ s/^\s+/ /;
+ push @{$mail{$last_header}}, $_;
Even though the comment said "unfold", you do not really do the
unfolding here and the caller can (if it wants to) figure out where
one logical header was folded in the original into multiple physical
lines, because you are returning an arrayref.
However, that means the caller still cannot tell what the original
was if you are given:
X-header: a b
c
X-header: d
as you would return { 'X-header' => ["a b", "c", "d")] }
In that sense, it may be better to do a real unfolding here, so that
it would return { 'X-header' => ["a b c", "d"] } from here instead?
I.e. instead of "push @{...}, $_", append $_ to the last element of
that array?
I will do that. It makes more sense regarding Subject split into several
lines, for example.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html