On 07/30, Junio C Hamano wrote: > Thomas Gummerer <t.gummerer@xxxxxxxxx> writes: > > > Currently when a user doesn't resolve a conflict in a file, but > > commits the file with the conflict markers, and later the file ends up > > in a state in which rerere can't handle it, subsequent rerere > > operations that are interested in that path, such as 'rerere clear' or > > 'rerere forget <path>' will fail, or even worse in the case of 'rerere > > clear' segfault. > > > > Such states include nested conflicts, or an extra conflict marker that > > doesn't have any match. > > > > This is because the first 'git rerere' when there was only one > > conflict in the file leaves an entry in the MERGE_RR file behind. The > > I find this sentence, especially the "only one conflict in the file" > part, a bit unclear. What does the sentence count as one conflict? > One block of lines enclosed inside "<<<"...">>>" pair? The command > behaves differently when there are two such blocks instead? Yeah as you mentioned below, conflict marker(s) that cannot be parsed here would make more sense. Will adjust the commit message. > > next 'git rerere' will then pick the rerere ID for that file up, and > > not assign a new ID as it can't successfully calculate one. It will > > however still try to do the rerere operation, because of the existing > > ID. As the handle_file function fails, it will remove the 'preimage' > > for the ID in the process, while leaving the ID in the MERGE_RR file. > > > > Now when 'rerere clear' for example is run, it will segfault in > > 'has_rerere_resolution', because status is NULL. > > I think this "status" refers to the collection->status[]. How do we > get into that state, though? > > new_rerere_id() and new_rerere_id_hex() fills id->collection by > calling find_rerere_dir(), which either finds an existing rerere_dir > instance or manufactures one with .status==NULL. The .status[] > array is later grown by calling fit_variant as we scan and find the > pre/post images, but because there is no pre/post image for a file > with unparseable conflicts, it is left NULL. > > So another possible fix could be to make sure that .status[] is only > read when .status_nr says there is something worth reading. I am > not saying that would be a better fix---I am just thinking out loud > to make sure I understand the issue correctly. Yeah what you are writing above matches my understanding, and that should fix the issue as well. I haven't actually tried what you're proposing above, but I think I find it nicer to just remove the entry we can't do anything with anyway. > > To fix this, remove the rerere ID from the MERGE_RR file in the case > > when we can't handle it, and remove the corresponding variant from > > .git/rr-cache/. Removing it unconditionally is fine here, because if > > the user would have resolved the conflict and ran rerere, the entry > > would no longer be in the MERGE_RR file, so we wouldn't have this > > problem in the first place, while if the conflict was not resolved, > > the only thing that's left in the folder is the 'preimage', which by > > itself will be regenerated by git if necessary, so the user won't > > loose any work. > > s/loose/lose/ > > > Note that other variants that have the same conflict ID will not be > > touched. > > Nice. Thanks for a fix. > > > > > Signed-off-by: Thomas Gummerer <t.gummerer@xxxxxxxxx> > > --- > > rerere.c | 12 +++++++----- > > t/t4200-rerere.sh | 22 ++++++++++++++++++++++ > > 2 files changed, 29 insertions(+), 5 deletions(-) > > > > diff --git a/rerere.c b/rerere.c > > index da1ab54027..895ad80c0c 100644 > > --- a/rerere.c > > +++ b/rerere.c > > @@ -823,10 +823,7 @@ static int do_plain_rerere(struct string_list *rr, int fd) > > struct rerere_id *id; > > unsigned char sha1[20]; > > const char *path = conflict.items[i].string; > > - int ret; > > - > > - if (string_list_has_string(rr, path)) > > - continue; > > + int ret, has_string; > > > > /* > > * Ask handle_file() to scan and assign a > > @@ -834,7 +831,12 @@ static int do_plain_rerere(struct string_list *rr, int fd) > > * yet. > > */ > > ret = handle_file(path, sha1, NULL); > > - if (ret < 1) > > + has_string = string_list_has_string(rr, path); > > + if (ret < 0 && has_string) { > > + remove_variant(string_list_lookup(rr, path)->util); > > + string_list_remove(rr, path, 1); > > + } > > + if (ret < 1 || has_string) > > continue; > > We used to say "if we know about the path we do not do anything > here, if we do not see any conflict in the file we do nothing, > otherwise we assign a new id"; we now say "see if we can parse > and also see if we have conflict(s); if we know about the path and > we cannot parse, drop it from the rr database (because otherwise the > entry will cause us trouble elsewhere later). Otherwise, if we do > not have any conflict or we already know about the path, no need to > do anything. Otherwise, i.e. a newly discovered path with conflicts > gets a new id". > > Makes sense. "A known path with unparseable conflict gets dropped" > is the important change in this hunk. > > > diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh > > index 8417e5a4b1..34f0518a5e 100755 > > --- a/t/t4200-rerere.sh > > +++ b/t/t4200-rerere.sh > > @@ -580,4 +580,26 @@ test_expect_success 'multiple identical conflicts' ' > > count_pre_post 0 0 > > ' > > > > +test_expect_success 'rerere with unexpected conflict markers does not crash' ' > > + git reset --hard && > > + > > + git checkout -b branch-1 master && > > + echo "bar" >test && > > + git add test && > > + git commit -q -m two && > > + > > + git reset --hard && > > + git checkout -b branch-2 master && > > + echo "foo" >test && > > + git add test && > > + git commit -q -a -m one && > > + > > + test_must_fail git merge branch-1 && > > + sed "s/bar/>>>>>>> a/" >test.tmp <test && > > + mv test.tmp test && > > OK, so the "only one conflict" in the log message meant just one > side of the conflict marker. More generally, the troublesome is > to have "conflict marker(s) that cannot be parsed" in the file. > > > + git rerere && > > + > > + git rerere clear > > +' > > + > > test_done