On Wed, Oct 24, 2012 at 04:46:36AM -0400, Jeff King wrote: > On Wed, Oct 24, 2012 at 10:03:35AM +0200, Krzysztof Mazur wrote: > > > The git-send-email always use RFC2047 subject quoting for files > > with "broken" encoding - non-ASCII files without Content-Transfer-Encoding, > > even for ASCII subjects. Now for ASCII subjects the RFC2047 quoting will be > > skipped. > > [...] > > - if ($broken_encoding{$t} && !is_rfc2047_quoted($subject)) { > > + if ($broken_encoding{$t} && !is_rfc2047_quoted($subject) && > > + ($subject =~ /[^[:ascii:]]/)) { > > Is that test sufficient? We would also need to encode if it has rfc2047 > specials, no? For Subject this should be sufficient. According to RFC822 after "Subject:" we have "text" token, --- from RFC822 --- / "Subject" ":" *text --- from RFC822 --- and text is defined as: --- from RFC822 --- text = <any CHAR, including bare ; => atoms, specials, CR & bare LF, but NOT ; comments and including CRLF> ; quoted-strings are ; NOT recognized. --- from RFC822 --- so only CRLF is not allowed in Subject. So the problem only exists for broken RFC2047-like texts, but I think it's ok to just pass such subjects, in most cases the Subject comes from already formatted patch file. I think that we just want to fix Subjects without specified encoding here. In most cases, when git-send-email is used for patches generated by "git format-patch" we just don't want to corrupt Subject. The "git format-patch" generates "broken" patches when commit message uses only ASCII characters and patch contains some non-ASCII characters. In this case original git-send-email, without this patch, adds RFC2047 quotation for pure ASCII Subject. > > It looks like we use the same regex elsewhere. Maybe this would be a > good chance to abstract out a needs_rfc2047_quoting while we are in the > area? It's a good idea, however rules are different for Subject and addresses (sanitize_address). I think we can go even further, we can just add quote_subject(), which performs this test and calls quote_rfc2047() if necessary. I'm sending bellow patch that does that. Krzysiek -- >From a1e6eef831275485ec1555d94ff0d9aac852dd12 Mon Sep 17 00:00:00 2001 From: Krzysztof Mazur <krzysiek@xxxxxxxxxxxx> Date: Wed, 24 Oct 2012 19:08:57 +0200 Subject: [PATCH] git-send-email: introduce quote_subject() The quote_rfc2047() always adds RFC2047 quoting and to avoid quoting ASCII subjects, before calling quote_rfc2047() subject must be tested for non-ASCII characters. To avoid this new quote_subject() function is introduced. The quote_subject() performs this test and calls quote_rfc2047() only if necessary. Signed-off-by: Krzysztof Mazur <krzysiek@xxxxxxxxxxxx> --- git-send-email.perl | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index efeae4c..e9aec8d 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -657,9 +657,7 @@ EOT $initial_subject = $1; my $subject = $initial_subject; $_ = "Subject: " . - ($subject =~ /[^[:ascii:]]/ ? - quote_rfc2047($subject, $compose_encoding) : - $subject) . + quote_subject($subject, $compose_encoding) . "\n"; } elsif (/^In-Reply-To:\s*(.+)\s*$/i) { $initial_reply_to = $1; @@ -907,6 +905,22 @@ sub is_rfc2047_quoted { $s =~ m/^(?:"[[:ascii:]]*"|=\?$token\?$token\?$encoded_text\?=)$/o; } +sub subject_needs_rfc2047_quoting { + my $s = shift; + + return !is_rfc2047_quoted($s) && ($s =~ /[^[:ascii:]]/); +} + +sub quote_subject { + local $subject = shift; + my $encoding = shift || 'UTF-8'; + + if (subject_needs_rfc2047_quoting($subject)) { + return quote_rfc2047($subject, $encoding); + } + return $subject; +} + # use the simplest quoting being able to handle the recipient sub sanitize_address { my ($recipient) = @_; @@ -1327,9 +1341,8 @@ foreach my $t (@files) { $body_encoding = $auto_8bit_encoding; } - if ($broken_encoding{$t} && !is_rfc2047_quoted($subject) && - ($subject =~ /[^[:ascii:]]/)) { - $subject = quote_rfc2047($subject, $auto_8bit_encoding); + if ($broken_encoding{$t}) { + $subject = quote_subject($subject, $auto_8bit_encoding); } if (defined $author and $author ne $sender) { -- 1.8.0.3.gf4c35fc -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html