git format-patch escaping issues in the patch format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey there.

For some special use case, I wanted to write a parser for the patch
format created by git format-patch, especially where I can separate
headers, commit message and the actual unified diffs.

There seems unfortunately only little (written) definition of that
format, git-format-patch(1) merely says it's in UNIX mailbox format
(which itself is, AFAIK, not really formally defined).


Anyway, it seems to turn out, that no escaping is done for the commit
message in the patch format and that this can cause actual breakage
with valid commit messages.

Consider the following example:
1. I create a fresh repo, add a test file and use a commit message,
   which contains a From line (even with the "magic" timestamp) and
   some made up commit id (0000...)

   ~/test$ git init foo; cd foo
   Initialized empty Git repository in /home/calestyo/test/foo/.git/
   ~/test/foo$ echo a >f; git add f
   ~/test/foo$ git commit -m "msg1
   
   From 0000000000000000000000000000000061603705 Mon Sep 17 00:00:00 2001
   --
   ---"
   [master (root-commit) c08debc] msg1
    1 file changed, 1 insertion(+)
    create mode 100644 f
   
   
2. The format-patch for that looks already suspicious:
   - The From line is not escaped (as some variants of mbox would do,
     some properly some, causing corruption by the escaping with >
     itself).
   - What the format may think of as a separator after the commit
     message (namely the ---) cannot be used as that either, as a ---
     in the commit message is again not escaped.
   
   ~/test/foo$ git format-patch --root; cat 0001-msg1.patch; rm -f 0001-msg1.patch
   0001-msg1.patch
   From c08debcc502c78786ec71d50686ff0445a13b654 Mon Sep 17 00:00:00 2001
   From: Christoph Anton Mitterer <mail@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
   Date: Mon, 4 Nov 2024 19:58:45 +0100
   Subject: [PATCH] msg1
   
   From 0000000000000000000000000000000061603705 Mon Sep 17 00:00:00 2001
   --
   ---
   ---
    f | 1 +
    1 file changed, 1 insertion(+)
    create mode 100644 f
   
   diff --git a/f b/f
   new file mode 100644
   index 0000000..7898192
   --- /dev/null
   +++ b/f
   @@ -0,0 +1 @@
   +a
   -- 
   2.45.2
   
   
3. Adding a 2nd commit, this time using the unified diff from the above
   patch as commit message body(!).
   
   ~/test/foo$ echo b >>f; git add f
   ~/test/foo$ git commit -m "msg2
   
   diff --git a/f b/f
   new file mode 100644
   index 0000000..7898192
   --- /dev/null
   +++ b/f
   @@ -0,0 +1 @@
   +a
   -- 
   2.45.2"
   [master 6bbe38c] msg2
    1 file changed, 1 insertion(+)
   ~/test/foo$ git format-patch --root
   0001-msg1.patch
   0002-msg2.patch
   
   
4. To no surprise, git itself of course knows the difference between
   commit message and actual patch, as show e.g. by the following,
   where the commit message is indented (by git):

   $ git log --patch | cat
   commit 6bbe38c33680239ac9767e0e5095f9f32ad41ade
   Author: Christoph Anton Mitterer <mail@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
   Date:   Mon Nov 4 20:00:20 2024 +0100
   
       msg2
       
       diff --git a/f b/f
       new file mode 100644
       index 0000000..7898192
       --- /dev/null
       +++ b/f
       @@ -0,0 +1 @@
       +a
       --
       2.45.2
   
   diff --git a/f b/f
   index 7898192..422c2b7 100644
   --- a/f
   +++ b/f
   @@ -1 +1,2 @@
    a
   +b
   
   commit c08debcc502c78786ec71d50686ff0445a13b654
   Author: Christoph Anton Mitterer <mail@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
   Date:   Mon Nov 4 19:58:45 2024 +0100
   
       msg1
       
       From 0000000000000000000000000000000061603705 Mon Sep 17 00:00:00 2001
       --
       ---
   
   diff --git a/f b/f
   new file mode 100644
   index 0000000..7898192
   --- /dev/null
   +++ b/f
   @@ -0,0 +1 @@
   +a
   

5. Next I try whether git am can use the patches created above in a
   fresh repo:
   
   ~/test/foo$ cd ..; git init bar; cd bar
   Initialized empty Git repository in /home/calestyo/test/bar/.git/
   ~/test/bar$ git am ../foo/0001-msg1.patch
   Patch is empty.
   hint: When you have resolved this problem, run "git am --continue".
   hint: If you prefer to skip this patch, run "git am --skip" instead.
   hint: To record the empty patch as an empty commit, run "git am --allow-empty".
   hint: To restore the original branch and stop patching, run "git am --abort".
   hint: Disable this message with "git config advice.mergeConflict false"
   
   That already fails for the first patch, the reason probably being my
      From 0000...
   line in the commit message.
   
   
6. So trying again with simply that From 000.. line removed
   
   ~/test/bar$ sed -i '/^From 00000/d' ../foo/0001-msg1.patch
   ~/test/bar$ git am ../foo/0001-msg1.patch
   fatal: previous rebase directory .git/rebase-apply still exists but mbox given.
   
   and again on a freshly created repo:
   
   ~/test/bar$ cd ..; rm -rf bar; git init bar; cd bar
   Initialized empty Git repository in /home/calestyo/test/bar/.git/
   ~/test/bar$ git am ../foo/0001-msg1.patch
   Applying: msg1
   applying to an empty history
   
   Ah, now it works, so it was indeed the (unusual but still valid commit message).
   
   
7. Now that 0001-msg1.patch is applied, let's try the 2nd patch:
   
   ~/test/bar$ git am ../foo/0002-msg2.patch
   Applying: msg2
   error: f: already exists in index
   Patch failed at 0001 msg2
   hint: Use 'git am --show-current-patch=diff' to see the failed patch
   hint: When you have resolved this problem, run "git am --continue".
   hint: If you prefer to skip this patch, run "git am --skip" instead.
   hint: To restore the original branch and stop patching, run "git am --abort".
   hint: Disable this message with "git config advice.mergeConflict false"
   ~/test/bar$ 
   
   And again, ... the reason most likely git not being able to get that
   the "first diff" is not actually a diff but part of the commit message.
   

btw and shamelessly off-topic question:
Any chance that git format-patch / am will ever support keeping track
of the branch/merge history of generated / applied patches?
That would be really neat.


Thanks,
Chris.

PS: Not subscribed, so please keep me CCed in case you want me to read
    any possible replies :-)





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux