On Tue, Sep 28, 2021 at 4:40 PM Jeff King <peff@xxxxxxxx> wrote: > > I just left a rather long-ish mail in the thread, but the gist of it is > that I'm worried that there's some possibility of corruption there if > the diff code writes objects. I didn't do a proof-of-concept there, but > I worked one up just now. Try this: Thanks for the testcase; it's very helpful. Let's make it a little more interesting, though... > > # make a repo > git init repo > cd repo > echo base >file > git add file > git commit -m base > > # two diverging branches > echo main >file > git commit -am main > git checkout -b side HEAD^ > echo side >file > git commit -am side > > # we get a conflict, and we resolve > git merge main > echo resolved >file > git commit -am merged > > # now imagine we had a file with a diff driver. I stuffed it > # in here after the fact, but it could have been here all along, > # or come as part of the merge, etc. > echo whatever >unrelated > echo "unrelated diff=foo" >.gitattributes If we change this line to: echo "file diff=foo" >.gitattributes Then the userdiff will fire on the file involved in the remerge-diff, including on the file included in the automatic 3-way content conflict, giving us a diff that looks like this: diff --git a/file b/file index 5c66bc5..2ab19ae 100644 --- a/file +++ b/file @@ -1,7 +1 @@ -<<<<<<< 7D4147C (SIDE) -SIDE -||||||| BE54C04 -BASE -======= -MAIN ->>>>>>> 03FC4E3 (MAIN) +RESOLVED > git add . > git commit --amend --no-edit > > # set up the diff driver not just to do a textconv, but to cache the > # result. This will require writing out new objects for the cache > # as part of the diff operation. > git config diff.foo.textconv "$PWD/upcase" > git config diff.foo.cachetextconv true > cat >upcase <<\EOF && > #!/bin/sh > tr a-z A-Z <$1 > EOF > chmod +x upcase > > # now show the diff > git log -1 --remerge-diff > > # and make sure the repo is still OK > git fsck > > The remerge diff will correctly show the textconv'd content (because > it's not in either parent, and hence an evil merge): > > diff --git a/unrelated b/unrelated > new file mode 100644 > index 0000000..982793c > --- /dev/null > +++ b/unrelated > @@ -0,0 +1 @@ > +WHATEVER > > but then fsck shows that our cache is corrupted: > > Checking object directories: 100% (256/256), done. > error: refs/notes/textconv/foo: invalid sha1 pointer 1e9b3ecffca73411001043fe914a7ec0e7291043 > error: refs/notes/textconv/foo: invalid reflog entry 1e9b3ecffca73411001043fe914a7ec0e7291043 > dangling tree 569b6e414203d2f50bdaafc1585eae11e28afc37 > Now I'll admit the textconv-cache is a pretty seldom-used feature. And > that there _probably_ aren't a lot of other ways that the diff code > would try to write objects or references. But I think it speaks to the > kind of subtle interactions we should worry about (and certainly > corrupting the repository is a pretty bad outcome, though at least in > this case it's "just" a cache and we could blow away that ref manually). I implemented the pretend_object_file() solution on Saturday and tried this out, and found with the above change that it has corruption problems as well. This really doesn't feel much safer too me. The move-the-tmp-objdir-to-just-being-an-alternate-instead-of-a-primary solution, as suggested at https://lore.kernel.org/git/YVOiggCWAdZcxAb6@xxxxxxxxxxxxxxxxxxxxxxx/ has the same problem. But, more importantly, neither of these solutions can be made safe. Even if we adopt the GIT_QUARANTINE_PATH suggestion from Neeraj (https://lore.kernel.org/git/20210929184339.GA19712@neerajsi-x1.localdomain/), we still have objects referencing now-missing objects. *However*, if we use the tmp-objdir-as-primary solution, and leave it as a primary, and use your modification to the GIT_QUARANTINE_PATH idea from Neeraj (i.e. create a "the-tmp-objdir" and have the refs code check for it and turn on the quarantine automatically when one exists), then we prevent adding or modifying refs to point to bad objects, and any new objects that textconv might write that could depend on transient objects, will themselves be transient and go away. So, this solution is safe from corruption. I'll be implementing it. (not sure how quickly I'll get it done, as my time is mostly limited to early mornings or evenings or Saturdays right now...)