Re: Bug in 'git log --remerge-diff' when used with '--find-object' and '--submodule=log|diff'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 29, 2022 at 9:36 AM Philippe Blain
<levraiphilippeblain@xxxxxxxxx> wrote:
>
> Hi Elijah,
>
> Le 2022-08-24 à 03:36, Elijah Newren a écrit :
> > Hi Philippe,
> >
> > On Mon, Aug 22, 2022 at 4:58 PM Philippe Blain
> > <levraiphilippeblain@xxxxxxxxx> wrote:
> >>
> >> Hi Elijah,
> >>
> >> I found two bugs in '--remerge-diff' when combined with both '--find-object' and
> >> '--submodule=log|diff'. I don't know if they have the same cause.
> >>
> >> When using these flags together, *all* 2-parents merge commits are shown (even in a repo with
> >> no submodule!)
> >>
> >> Moreover, for merges with conflicted paths, all conflicted paths are shown as "(new submodule)",
> >> even if they are not a submodule (in fact, even when no submodule is present
> >> in the repository!).
> >>
> >> This artificial example reproduces the bug:
> >>
> >> ```bash
> >> #!/bin/bash
> >>
> >> set -euEo pipefail
> >>
> >> repo=repro
> >> rm -rf $repo
> >>
> >> TEST_AUTHOR_LOCALNAME=author
> >> TEST_AUTHOR_DOMAIN=example.com
> >> GIT_AUTHOR_EMAIL=${TEST_AUTHOR_LOCALNAME}@${TEST_AUTHOR_DOMAIN}
> >> GIT_AUTHOR_NAME='A U Thor'
> >> GIT_AUTHOR_DATE='1112354055 +0200'
> >> TEST_COMMITTER_LOCALNAME=committer
> >> TEST_COMMITTER_DOMAIN=example.com
> >> GIT_COMMITTER_EMAIL=${TEST_COMMITTER_LOCALNAME}@${TEST_COMMITTER_DOMAIN}
> >> GIT_COMMITTER_NAME='C O Mitter'
> >> GIT_COMMITTER_DATE='1112354055 +0200'
> >> export GIT_AUTHOR_EMAIL GIT_AUTHOR_NAME
> >> export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME
> >> export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
> >> export HOME=
> >>
> >> git -c init.defaultBranch=unimportant init $repo
> >> cd $repo
> >> echo i>file
> >> git add file
> >> git commit -m file
> >> git checkout -b side
> >> echo s>>file2
> >> git add file2
> >> git commit -am side
> >> git checkout -
> >> echo m >>file
> >> git commit -am main
> >> git merge side -m clean
> >> git checkout side
> >> echo c>>file
> >> git add file
> >> git commit -am side2
> >> git checkout -
> >> echo cc>>file
> >> git commit -am main2
> >> git merge side || true
> >> printf 'i\nm' > file
> >> git commit -am conflicted
> >> # look for a dummy object
> >> git log --oneline --diff-merges=r --submodule=log --find-object=2c042ac4213768e55791098110d2ef2ef845881a
> >> # same output with --submodule=diff
> >> ```
> >>
> >> This is the output I get from the 'git log' call:
> >>
> >> b4f1be9 (HEAD -> unimportant) conflicted
> >> Submodule file 0000000...0000000 (new submodule)
> >> a4ef223 clean
> >>
> >> Notice both merges are shown despite none of them changing the number of occurences of
> >> 2c042ac4213768e55791098110d2ef2ef845881a, which does not even exist in this repository,
> >> and also that the conflicted merge shows 'file' as "new submodule".
> >
> > Thanks for the report, and the steps to reproduce.  Very helpful.
> >
> > After some digging, it appears the remerge-diff headers are
> > misinterpreted by the submodule code.  They transform what would have
> > been the following output:
> >
> >     b4f1be9 (HEAD -> unimportant) conflicted
> >     diff --git a/file b/file
> >     remerge CONFLICT (content): Merge conflict in file
> >     a4ef223 clean
> >
> > into what you saw, namely
> >
> >     b4f1be9 (HEAD -> unimportant) conflicted
> >     Submodule file 0000000...0000000 (new submodule)
> >     a4ef223 clean
> >
> > We can recover the intended remerge-diff header with the following simple patch:
> >
> > diff --git a/diff.c b/diff.c
> > index 974626a621..be23f66057 100644
> > --- a/diff.c
> > +++ b/diff.c
> > @@ -3429,14 +3429,16 @@ static void builtin_diff(const char *name_a,
> >
> >         if (o->submodule_format == DIFF_SUBMODULE_LOG &&
> >             (!one->mode || S_ISGITLINK(one->mode)) &&
> > -           (!two->mode || S_ISGITLINK(two->mode))) {
> > +           (!two->mode || S_ISGITLINK(two->mode)) &&
> > +           (one->mode || two->mode)) {
> >                 show_submodule_diff_summary(o, one->path ? one->path :
> > two->path,
> >                                 &one->oid, &two->oid,
> >                                 two->dirty_submodule);
> >                 return;
> >         } else if (o->submodule_format == DIFF_SUBMODULE_INLINE_DIFF &&
> >                    (!one->mode || S_ISGITLINK(one->mode)) &&
> > -                  (!two->mode || S_ISGITLINK(two->mode))) {
> > +                  (!two->mode || S_ISGITLINK(two->mode)) &&
> > +                  (one->mode || two->mode)) {
> >                 show_submodule_inline_diff(o, one->path ? one->path : two->path,
> >                                 &one->oid, &two->oid,
> >                                 two->dirty_submodule);
> >
> > In other words, when we have information about something other than a
> > submodule, don't attempt to treat it as information about a submodule.
>
> Thanks for digging into this.
> From what I understand in the case of a remerge-diff, both modes are all-zero, and this is
> not expected by the submodule diff code. Were you planning to submit a proper
> patch ? I could get to it eventually, but not before mid/end of September...

I intend to submit a proper patch; I've just been busy.

> >
> > Now, whether the remerge-diff header should be shown is an interesting
> > one -- it's a carry on to the discussion in the thread at [1], and
> > isn't simple to answer.
> > [1] https://lore.kernel.org/git/CABPp-BHjU+wDXNnf-rsGy86GvOFWH6qVLPEfAA2JDfLFRU4WSA@xxxxxxxxxxxxxx/
>
> Thanks for a link to a very interesting discussion! However I'm not exactly sure
> what you meant here.

I'm just saying there are special cases where it's difficult to
determine whether to include the "remerge CONFLICT" diff headers.  For
basic cases, they are obviously wanted when running under
"--remerge-diff" since that mode was designed to show that kind of
information.  I think below you were trying to get at the fact that
remerge headers will often be accompanied by one or more diff hunks
which show content changes in the relevant file...and implying that if
there are no such diff hunks (because other flags caused those to be
filtered out), then the remerge header should also be omitted.
Unfortunately such a strategy would only work if we could replace
"often" with "always" in that last sentence.  We can't do so; there
are cases where remerge headers are accompanied by zero diff hunks and
those headers need to be shown despite there being zero related diff
hunks.  (A pretty common example: imagine a simple modify/delete
conflict resolved in favor of keeping the modified file.  The
resulting file recorded in the merge commit exactly matches the
auto-merged one so there are no diff hunks to show, but a `git show
--remerge-diff $MERGE_COMMIT` should definitely show the modify/delete
conflict header since the merge was conflicted and the user did work
to resolve it.)  In fact, my first implementation of "git show
--remerge-diff" did only include the remerge headers when the diff for
the corresponding file was non-empty, but I almost immediately noted
several examples in my initial testing where doing so was causing
important information to be incorrectly filtered out.

[...]
> Because if I do add '-p', (in my reproducer) I still do not get any diff content. But if I remove
> '--find-object', then I do...

--remerge-diff turns on -p implicitly, so adding an explicit '-p' is
not a useful test here.

--find-object does some filtering, but how is the portion of the code
responsible for adding the remerge headers supposed to know whether
such filtering has happened and if it's relevant to any conflict
headers that it needs to add?  I'm not sure how much information about
the command line flags or what filtering happened is available at that
point in the code.  And, as noted above, the lack of diff hunks is not
a usable criteria for judging whether to include the header.

> > As for the first bug, the showing of any 2-parent commits, yes that's
> > true.  But it's not limited to remerge-diff; you can change
> > --diff-merges=r to --diff-merges=c or --diff-merges=cc and see the
> > same thing.  I'm not sure if that means that my looking at the
> > do_diff_combined() logic when developing the do_remerge_diff()
> > function meant I copied a bug, or if I independently came up with it
> > on my own.  But, for now, I need some sleep.  Just thought I'd give
> > you an update on what I have found so far, though.
> >
>
> Yes, I had noticed that I had the same behaviour with --cc or -c, I forgot to mention it.
> With 'separate' or 'first-parent', though, no commits are shown (as expected).
> To be clear, this behaviour is independent of the use of '--submodule=short|log|diff'.

Yep, the modes that treat merges as actual merges instead of
pretending they are regular commits are precisely the modes which
exhibit this behavior.  I'll have to dig into why.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux