Re: [PATCH] format-patch: teach --no-encode-headers

Jeff King <peff@xxxxxxxx> · Mon, 6 Apr 2020 09:30:40 -0400

On Mon, Apr 06, 2020 at 03:04:44AM +0000, brian m. carlson wrote:

> On 2020-04-05 at 23:11:09, Emma Brooks wrote:
> > When commit subjects or authors have non-ASCII characters, git
> > format-patch Q-encodes them so they can be safely sent over email.
> > However, if the patch transfer method is something other than email (web
> > review tools, sneakernet), this only serves to make the patch metadata
> > harder to read without first applying it (unless you can decode RFC 2047
> > in your head). git am as well as some email software supports
> > non-Q-encoded mail as described in RFC 6531.
> 
> Do we always output UTF-8 in this case, or do we sometimes output other
> encodings if the user has specified one for the commit message?

That was my first question, too. But I think even without this option,
we always respect i18n.logOutputEncoding before we even hit the email
pretty-printing code. So by default it would always be utf8 (and
otherwise whatever the user has asked us to output).

That would obviously be disastrous for an output encoding that isn't an
ASCII superset, but that's already true for any of our output formats.

> Do we know how git send-email handles such a message if it receives
> one?
> 
> I know it isn't your intention to work with git send-email in this
> patch, but it would be nice to know whether there's additional value in
> someone sending a followup patch to make git send-email use SMTPUTF8 if
> that's necessary.

I suspect this is mostly orthogonal, as that deals only with the
SMTP-level addresses, which include only the actual email part (not the
name) and aren't RFC2047-encoded anyway. It looks like we already leave
characters in addresses untouched (I'm not even 100% sure that RFC2047
allows modifying within the local part of an addr):

  $echo foo >file
  $ git add file
  $ git -c user.email=péff@xxxxxxxx commit -m foo
  $ git format-patch -1 --stdout | grep From:
  From: Jeff King <péff@xxxxxxxx>

I did wonder if there are any standards around 8bit headers. Certainly
the de facto standard for local tools (e.g., mutt reading a message
you've edited in vim) is that they can be treated like a stream of
ASCII-compatible bytes, and that works pretty well in practice. But if
there's an IETF-endorsed method for 8bit headers, it would be nice to
use it. For 8bit bodies, we're able to give a content-transfer-encoding
and a content-type with the charset. But I don't know of an equivalent
for headers.

-Peff