Re: [PATCH] git-send-email: skip RFC2047 quoting for ASCII subjects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 24, 2012 at 04:46:36AM -0400, Jeff King wrote:
> On Wed, Oct 24, 2012 at 10:03:35AM +0200, Krzysztof Mazur wrote:
> 
> > The git-send-email always use RFC2047 subject quoting for files
> > with "broken" encoding - non-ASCII files without Content-Transfer-Encoding,
> > even for ASCII subjects. Now for ASCII subjects the RFC2047 quoting will be
> > skipped.
> > [...]
> > -	if ($broken_encoding{$t} && !is_rfc2047_quoted($subject)) {
> > +	if ($broken_encoding{$t} && !is_rfc2047_quoted($subject) &&
> > +			($subject =~ /[^[:ascii:]]/)) {
> 
> Is that test sufficient? We would also need to encode if it has rfc2047
> specials, no?

For Subject this should be sufficient. According to RFC822 after
"Subject:" we have "text" token,

--- from RFC822 ---
                 /  "Subject"           ":"  *text
--- from RFC822 ---

and text is defined as:

--- from RFC822 ---
     text        =  <any CHAR, including bare    ; => atoms, specials,
                     CR & bare LF, but NOT       ;  comments and
                     including CRLF>             ;  quoted-strings are
                                                 ;  NOT recognized.
--- from RFC822 ---

so only CRLF is not allowed in Subject.


So the problem only exists for broken RFC2047-like texts, but I think
it's ok to just pass such subjects, in most cases the Subject comes
from already formatted patch file. I think that we just want to fix Subjects
without specified encoding here.


In most cases, when git-send-email is used for patches generated
by "git format-patch" we just don't want to corrupt Subject. The
"git format-patch" generates "broken" patches when commit message
uses only ASCII characters and patch contains some non-ASCII characters.
In this case original git-send-email, without this patch, adds RFC2047
quotation for pure ASCII Subject.

> 
> It looks like we use the same regex elsewhere. Maybe this would be a
> good chance to abstract out a needs_rfc2047_quoting while we are in the
> area?

It's a good idea, however rules are different for Subject and addresses
(sanitize_address).

I think we can go even further, we can just add quote_subject(),
which performs this test and calls quote_rfc2047() if necessary.
I'm sending bellow patch that does that.

Krzysiek
-- 
>From a1e6eef831275485ec1555d94ff0d9aac852dd12 Mon Sep 17 00:00:00 2001
From: Krzysztof Mazur <krzysiek@xxxxxxxxxxxx>
Date: Wed, 24 Oct 2012 19:08:57 +0200
Subject: [PATCH] git-send-email: introduce quote_subject()

The quote_rfc2047() always adds RFC2047 quoting and to avoid quoting ASCII
subjects, before calling quote_rfc2047() subject must be tested for non-ASCII
characters. To avoid this new quote_subject() function is introduced.
The quote_subject() performs this test and calls quote_rfc2047() only if
necessary.

Signed-off-by: Krzysztof Mazur <krzysiek@xxxxxxxxxxxx>
---
 git-send-email.perl | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index efeae4c..e9aec8d 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -657,9 +657,7 @@ EOT
 			$initial_subject = $1;
 			my $subject = $initial_subject;
 			$_ = "Subject: " .
-				($subject =~ /[^[:ascii:]]/ ?
-				 quote_rfc2047($subject, $compose_encoding) :
-				 $subject) .
+				quote_subject($subject, $compose_encoding) .
 				"\n";
 		} elsif (/^In-Reply-To:\s*(.+)\s*$/i) {
 			$initial_reply_to = $1;
@@ -907,6 +905,22 @@ sub is_rfc2047_quoted {
 	$s =~ m/^(?:"[[:ascii:]]*"|=\?$token\?$token\?$encoded_text\?=)$/o;
 }
 
+sub subject_needs_rfc2047_quoting {
+	my $s = shift;
+
+	return !is_rfc2047_quoted($s) && ($s =~ /[^[:ascii:]]/);
+}
+
+sub quote_subject {
+ 	local $subject = shift;
+ 	my $encoding = shift || 'UTF-8';
+
+ 	if (subject_needs_rfc2047_quoting($subject)) {
+		return quote_rfc2047($subject, $encoding);
+ 	}
+ 	return $subject;
+}
+
 # use the simplest quoting being able to handle the recipient
 sub sanitize_address {
 	my ($recipient) = @_;
@@ -1327,9 +1341,8 @@ foreach my $t (@files) {
 		$body_encoding = $auto_8bit_encoding;
 	}
 
-	if ($broken_encoding{$t} && !is_rfc2047_quoted($subject) &&
-			($subject =~ /[^[:ascii:]]/)) {
-		$subject = quote_rfc2047($subject, $auto_8bit_encoding);
+	if ($broken_encoding{$t}) {
+		$subject = quote_subject($subject, $auto_8bit_encoding);
 	}
 
 	if (defined $author and $author ne $sender) {
-- 
1.8.0.3.gf4c35fc

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]