On 1/6/20 5:07 PM, Junio C Hamano wrote: > George Dunlap <george.dunlap@xxxxxxxxxx> writes: > >> On 12/18/19 12:15 PM, George Dunlap wrote: >>> On 12/18/19 11:42 AM, George Dunlap wrote: >>>> Using git 2.24.0 from Debian testing. >>>> >>>> It seems that git-am will strip CRLF endings from mails before applying >>>> patches when the mail isn't encoded in any way. It will also decode >>>> base64-encoded mails. But it won't strip CRLF endings from >>>> base64-encoded mails. >>>> >>>> Attached are two mbox files for two different recent series. >>>> plainenc.am applies cleanly with `git am`, while base64enc.am doesn't. >>>> >>>> Poking around the man pages, it looks like part of the issue might be >>>> that the CRLF stripping is done in `git mailsplit`, before the base64 >>>> encoding, rather than after. >>> >>> Poking around -- it looks like the CRLF stripping would be better done >>> in `git mailinfo` after the decoding. >> >> Anyone want to take this up? I mean, I could try to send a patch, but >> since I've never looked at the git source code before, I'm sure it would >> take me about 10x as much effort for me to do it as for someone already >> familiar with the codebase. > > Even before writing a patch, somebody needs to come up with a > sensible design first. --[no-]keep-cr is about "because transfer of > e-mail messages between MTAs and to the receiving MUA is defined in > terms of CRLF delimited lines per RFC, Git cannot tell if the CRLF > in the input was meant to be part of the patch (i.e. the diff is > describing a change between preimage and postimage of a file that > uses CRLF line endings) or they are cruft added during transit. By > default we favor LF endings so we will strip, but we leave an option > to keep CRs at the end of lines". > > What you are asking for is quite different, isn't it? "We know the > CRLF in the payload is from the original because they were protected > from getting munged during the transfer by being MIME-encased. > Please tell Git to preprocess that payload to convert CRLF to LF > before treating it as a patch". Actually that's not true. :-) The RFC specifies CRLF for text sections regardless of the encoding: 8<---- The canonical form of any MIME "text" subtype MUST always represent a line break as a CRLF sequence. Similarly, any occurrence of CRLF in MIME "text" MUST represent a line break. Use of CR and LF outside of line break sequences is also forbidden. ----->8 [1] https://tools.ietf.org/html/rfc2046#section-4.1.1 Just as for plaintext encoding, we do not, in fact, know that CRLF is in the original; in the example I included above, I'm confident that CRLF is *not* in the original, but was added by an MTA afterwards. As such, at the moment what `mailsplit` does is generate a load of non-RFC-compliant email messages (since text-plain encoded messages won't have CRLF, in contravention of the spec), and `mailinfo` incorrectly interprets base64-encoded sections. Moving the CR-stripping from mailsplit to mailinfo would make them both more RFC-compliant. (Sorry this wasn't in the original report -- I've been doing a lot more digging since then.) > So, if you are imagining to change the meaning of --[no-]keep-cr, I > do not think it will fly (that is why I said that we need a sensible > design before a patch). > > And by stepping back a bit like so, and once we start viewing this > as "after receiving a piece of e-mail from MUA (where --[no-]keep-cr > may affect the outermost CRLF line endings) and unwrapping possible > MIME-encasing, we can optionally tell Git to pass the payload > further through a preprocess filter", we'd realize that this does > not have to be limited to just running dos2unix (you may want to run > iconv to fix encodings, for example), which would mean that the new > flag may not just want to be --strip-cr, which is too limiting, but > rather want to be --filter-message=<how> where <how> could be one of > the canned preprocess filter (among which your dos2unix may exist) > or an external script. > > I am not saying that "--filter-message=<how>" must be the "sensible > design" I mentioned at the beginning of this message---the above is > to illustrate what kind of thought needs to go in before even the > first line of the patch gets written. As I mentioned in another thread, there's already an `applypatch-msg` hook which can be used to do arbitrary modifications on commit messages before applying. Another way to fix this would be to add an `applypatch-patch` hook, which allowed you to do arbitrary modifications on the patch before applying. I certainly think that `applypatch-patch` would be a useful thing to add. But since making `mailsplit` and `mailinfo` more RFC-compliant is both good in itself, and probably easier, I still think that's the best thing to do first. -George