On Thu, Sep 09 2021, brian m. carlson wrote: > [[PGP Signed Part:Undecided]] > I'm having a bit of a problem with segfaults using > 2.33.0.252.g9b09ab0cd71. I was in the process of running "git rebase > --continue" with that version to resolve some conflicts in a project I'm > working on. At that point, it segfaulted, and I got this (apologies for > the French): > > error: impossible d'analyser la section en conflit dans 'scutiger-lfs/src/bin/git-lfs-transfer.rs' > error: impossible d'analyser la section en conflit dans 'scutiger-lfs/src/bin/git-lfs-transfer.rs' > Pré-image enregistrée pour 'scutiger-lfs/src/bin/git-lfs-transfer.rs' > [HEAD détachée 5134185] scutiger-lfs: move BatchItem, Mode, and Oid to library > 2 files changed, 74 insertions(+), 9 deletions(-) > Fusion automatique de scutiger-lfs/src/processor.rs > Fusion automatique de scutiger-lfs/src/bin/git-lfs-transfer.rs > CONFLIT (contenu) : Conflit de fusion dans scutiger-lfs/src/bin/git-lfs-transfer.rs > error: impossible d'appliquer 2cd1615... scutiger-lfs: move download and verify actions into backend > Resolve all conflicts manually, mark them as resolved with > "git add/rm <conflicted_files>", then run "git rebase --continue". > You can instead skip this commit: run "git rebase --skip". > To abort and get back to the state before "git rebase", run "git rebase --abort". > error: impossible d'analyser la section en conflit dans 'scutiger-lfs/src/bin/git-lfs-transfer.rs' > zsh: segmentation fault git rebase --continue > > I can provide a tarball of the broken repo upon request. It's 108 MB, > so you'll need to provide some place for me to drop it. > > At that point, I needed to remove .git/MERGE_RR.lock (which is empty), > and I ran "git rebase --abort" (because I realized my resolution was > bad). Upon which, I received a second segfault (traceback from tip of > next): > > #0 0x00000000005b6e6b in has_rerere_resolution (id=0x229b3a0) at rerere.c:162 > 162 return ((id->collection->status[variant] & both) == both); > > #0 0x00000000005b6e6b in has_rerere_resolution (id=0x229b3a0) at rerere.c:162 > both = 3 > variant = 0 > #1 rerere_clear (r=0x722880 <the_repo>, merge_rr=merge_rr@entry=0x7ffe8c703810) at rerere.c:1239 > id = 0x229b3a0 > i = 0 > > It appears to be because id->collection->status here is NULL. It's > unclear to me why that's the case, but it does appear to be linked to my > .git/MERGE_RR file, which looks like this (xxd -g1): > > 00000000: 30 34 39 62 31 34 32 39 34 65 64 30 63 30 65 66 049b14294ed0c0ef > 00000010: 62 65 39 32 66 34 66 64 33 31 31 63 37 63 34 30 be92f4fd311c7c40 > 00000020: 39 30 39 34 65 63 64 65 09 73 63 75 74 69 67 65 9094ecde.scutige > 00000030: 72 2d 6c 66 73 2f 73 72 63 2f 62 69 6e 2f 67 69 r-lfs/src/bin/gi > 00000040: 74 2d 6c 66 73 2d 74 72 61 6e 73 66 65 72 2e 72 t-lfs-transfer.r > 00000050: 73 00 s. > > The corresponding directory under .git/rr-cache is empty. This > specifically seems to be the problem, and I've included a snippet of a > test below that causes the same segfault. My guess is that the original > segfault left the MERGE_RR file present but the files in the rr-cache > directory absent. > > Since rerere isn't a strong suit of mine, I'm not sending a patch, but I > am including a failing test[0] which indicates the problem (and to which > anyone is welcome to add my sign-off) in hopes that someone more > familiar with this subsystem can figure out what's going on. I haven't taken time to familiarize myself with it, but I have one segfault fix for it already[1] which needs a test, I tried and this is completely unrelated. The "fix" for this segfault you're reporting could be, this makes your test pass, along with the rest of the test suite: diff --git a/rerere.c b/rerere.c index d83d58df4fb..cbad6a89ebc 100644 --- a/rerere.c +++ b/rerere.c @@ -157,6 +157,9 @@ static int has_rerere_resolution(const struct rerere_id *id) const int both = RR_HAS_POSTIMAGE|RR_HAS_PREIMAGE; int variant = id->variant; + if (variant == id->collection->status_nr) + return 1; + if (variant < 0) return 0; return ((id->collection->status[variant] & both) == both); But I have no idea if that's sensible. I.e. as you found the immediate cause here is that our "id" is being looked up in "id->collection->status" where that's NULL, there's similar checks elsewhere for "id->collection->status_nr", but if that's correct here I don't know, nor what the deeper logic error is of how the two went out of sync. 1. https://lore.kernel.org/git/87v96p4w3f.fsf@xxxxxxxxxxxxxxxxxxx/ > ----- %< ----- > diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh > index 9f8c76dffb..c44dfe248a 100755 > --- a/t/t4200-rerere.sh > +++ b/t/t4200-rerere.sh > @@ -623,6 +623,7 @@ test_expect_success 'rerere with inner conflict markers' ' > git commit -q -m "will solve conflicts later" && > test_must_fail git merge A && > > + cp .git/MERGE_RR merge_rr && > echo "resolved" >test && > git add test && > git commit -q -m "solved conflict" && > @@ -645,6 +646,13 @@ test_expect_success 'rerere with inner conflict markers' ' > test_cmp expect actual > ' > > +test_expect_success 'rerere clear does not segfault with bad data' ' > + res_id=$($PERL_PATH -nF"\t" -e "print \$F[0]" merge_rr) && > + cp merge_rr .git/MERGE_RR && > + rm -f .git/rr-cache/$res_id/* && > + git rerere clear > +' > + > test_expect_success 'setup simple stage 1 handling' ' > test_create_repo stage_1_handling && > ( > ----- %< ----- > > As an aside, I generally find Git's codebase to be of exceptionally good > quality for a C program, and so seeing two segfaults back to back led me > to downgrade my recently upgraded version of glibc to see if somehow > that was the problem. Unfortunately, that was not the case, and we just > have two separate bugs here. > > [0] The test uses perl because the MERGE_RR file contains a NUL byte and > therefore cannot be used with standard POSIX utilities. (Just a side-note, I think the use of perl here is fine) I haven't tried, but I think you can probably do that res_id assignment with "git grep" and -o, see t7816-grep-binary-pattern.sh for how we can support greps with \0 in them with -f.