Re: What's cooking in git.git (Sep 2021, #08; Mon, 27)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 28, 2021 at 4:40 PM Jeff King <peff@xxxxxxxx> wrote:
>
> I just left a rather long-ish mail in the thread, but the gist of it is
> that I'm worried that there's some possibility of corruption there if
> the diff code writes objects. I didn't do a proof-of-concept there, but
> I worked one up just now. Try this:

Thanks for the testcase; it's very helpful.

Let's make it a little more interesting, though...

>
>   # make a repo
>   git init repo
>   cd repo
>   echo base >file
>   git add file
>   git commit -m base
>
>   # two diverging branches
>   echo main >file
>   git commit -am main
>   git checkout -b side HEAD^
>   echo side >file
>   git commit -am side
>
>   # we get a conflict, and we resolve
>   git merge main
>   echo resolved >file
>   git commit -am merged
>
>   # now imagine we had a file with a diff driver. I stuffed it
>   # in here after the fact, but it could have been here all along,
>   # or come as part of the merge, etc.
>   echo whatever >unrelated
>   echo "unrelated diff=foo" >.gitattributes

If we change this line to:

  echo "file diff=foo" >.gitattributes

Then the userdiff will fire on the file involved in the remerge-diff,
including on the file included in the automatic 3-way content
conflict, giving us a diff that looks like this:

diff --git a/file b/file
index 5c66bc5..2ab19ae 100644
--- a/file
+++ b/file
@@ -1,7 +1 @@
-<<<<<<< 7D4147C (SIDE)
-SIDE
-||||||| BE54C04
-BASE
-=======
-MAIN
->>>>>>> 03FC4E3 (MAIN)
+RESOLVED

>   git add .
>   git commit --amend --no-edit
>
>   # set up the diff driver not just to do a textconv, but to cache the
>   # result. This will require writing out new objects for the cache
>   # as part of the diff operation.
>   git config diff.foo.textconv "$PWD/upcase"
>   git config diff.foo.cachetextconv true
>   cat >upcase <<\EOF &&
>   #!/bin/sh
>   tr a-z A-Z <$1
>   EOF
>   chmod +x upcase
>
>   # now show the diff
>   git log -1 --remerge-diff
>
>   # and make sure the repo is still OK
>   git fsck
>
> The remerge diff will correctly show the textconv'd content (because
> it's not in either parent, and hence an evil merge):
>
>   diff --git a/unrelated b/unrelated
>   new file mode 100644
>   index 0000000..982793c
>   --- /dev/null
>   +++ b/unrelated
>   @@ -0,0 +1 @@
>   +WHATEVER
>
> but then fsck shows that our cache is corrupted:
>
>   Checking object directories: 100% (256/256), done.
>   error: refs/notes/textconv/foo: invalid sha1 pointer 1e9b3ecffca73411001043fe914a7ec0e7291043
>   error: refs/notes/textconv/foo: invalid reflog entry 1e9b3ecffca73411001043fe914a7ec0e7291043
>   dangling tree 569b6e414203d2f50bdaafc1585eae11e28afc37


> Now I'll admit the textconv-cache is a pretty seldom-used feature. And
> that there _probably_ aren't a lot of other ways that the diff code
> would try to write objects or references. But I think it speaks to the
> kind of subtle interactions we should worry about (and certainly
> corrupting the repository is a pretty bad outcome, though at least in
> this case it's "just" a cache and we could blow away that ref manually).


I implemented the pretend_object_file() solution on Saturday and tried
this out, and found with the above change that it has corruption
problems as well.  This really doesn't feel much safer too me.

The move-the-tmp-objdir-to-just-being-an-alternate-instead-of-a-primary
solution, as suggested at
https://lore.kernel.org/git/YVOiggCWAdZcxAb6@xxxxxxxxxxxxxxxxxxxxxxx/
has the same problem.

But, more importantly, neither of these solutions can be made safe.
Even if we adopt the GIT_QUARANTINE_PATH suggestion from Neeraj
(https://lore.kernel.org/git/20210929184339.GA19712@neerajsi-x1.localdomain/),
we still have objects referencing now-missing objects.

*However*, if we use the tmp-objdir-as-primary solution, and leave it
as a primary, and use your modification to the GIT_QUARANTINE_PATH
idea from Neeraj (i.e. create a "the-tmp-objdir" and have the refs
code check for it and turn on the quarantine automatically when one
exists), then we prevent adding or modifying refs to point to bad
objects, and any new objects that textconv might write that could
depend on transient objects, will themselves be transient and go away.
So, this solution is safe from corruption.  I'll be implementing it.

(not sure how quickly I'll get it done, as my time is mostly limited
to early mornings or evenings or Saturdays right now...)



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux