Re: git format-patch escaping issues in the patch format

"Kristoffer Haugsbakk" <kristofferhaugsbakk@xxxxxxxxxxxx> · Tue, 05 Nov 2024 15:32:21 +0100

On Tue, Nov 5, 2024, at 02:26, Christoph Anton Mitterer wrote:
> On Mon, 2024-11-04 at 23:15 +0100, Kristoffer Haugsbakk wrote:
>> It seems to me (totally naïve) that you would do something like
>>
>> 1. Blank line terminates headers
>> 2. Then there might be some optional commit-headers which can
>> override
>>    things (`From`)
>> 3. Commit message
>> 4. `---`
>> 5. Look for a regex `^diff` line
>>    • Now the indentation will tell you when it ends
>> 6. `^Range-diff` and `^Interdiff` can also make an appearance in this
>>    section
>
> Well as you've seen by the follow-up, such a naive approach is not
> really possible, as the commit message may also contain ---, unified
> diffs, etc.

It’s possible in the sense that it would work just as well as
git-format-patch(1).

You could make it more robust with some backtracking, like finding the
last `^diff` and movig back.  That’s OK in a file with only one patch
(`.patch`) but harder to do for an mbox file.

> […]
>> It seems like it would be nice if format-patch complained if it found
>> regex `^---$` in the commit body.
>
> Actually already when committing... cause there it's taken as valid and
> then it should also work with any following tools.

That would inconvenience all users that never use format-patch.  You
could provide an advice/hint about some configuration variable which
turns off this new default.  But that’s a lot of work to benefit
format-patch users.

Such an error would have to be a default because an opt-in would only be
discovered (in practice) after you’ve been burned once.  And then the
opt-in error is of very little value.  You’re realistically not going to
forget and write commit messages with `^---$` after that.

>> The magic string is unlikely but could happen.  The solution is to
>> use
>> an indented block.  Same for the diff.  (Hopefully few have to
>> code-quote diffs)
>
> As written in the other mail, there is nothing real obvious for the
> user that this wouldn’t be allowed, and in fact committing and such
> works.
> The simple problem here is the fuzzy format which cannot be parsed
> properly.

It’s not non-obvious either.  The simple format is apparent if you
review your patches before sending them out into the world.  (I’m
paranoid so I do that)

>> But escaping things in this format?
>
> Coudln't one do something like this:
>
> If the line consisting of three - is the line that ends the commit
> message, check during format-patch whether it's contained in that.
> If not, generate the patch as now.
> If so, use another magic timestamp and/or (since that might get lost
> when sending as mail, set some X-git-patch-format: header, there adding
> perhaps a flag like "escaped" and if that's set, any line that matches
> the regexp:
>>*---
> get's another > prepended when escaping, and one removed when
> unescaping (well in the latter only lines that match >+---).
> * = 0..n
> + = 1..n
>
> Or probably thinking about some more sophisticated solution or at least
> a better character than > .

That’s interesting and a good idea to use an email header to signal the
escaping.

Thinking just about `^---$`: an email header could be generated if
`^---$` occurs in the commit message.  Then it could suggest something
non-occurring instead.  Simply `-----` or `***` or something.

But this might not just help with the commit message separator.
Apparently the status quo is that git-am(1) will try to apply the diff
in a commit message.  Even though it hasn’t seen `^---$` yet.  But if it
was required to see that first then you could do something like:

```
diff ...
<diff content>
```

And not have any issues.

Of course you can’t change the magic mbox string.  But I don’t see how
you can’t do some more expensive parsing and require that you see the
end of the commit message before parsing this string as the
marker/delimiter.

Some niggles: the commit message might just be the subject line and
there might be no diff (empty patch).  Then looking for `^---$` will
take you too far.  Maybe just look for the signature line.