Re: Determining if a merge was produced automatically

Martin von Zweigbergk <martinvonz@xxxxxxxxx> · Sun, 30 Jun 2024 17:45:17 -0700

Forwarding to the list without HTML so others can correct me if I was wrong.

On Sun, Jun 30, 2024 at 3:32 PM Martin von Zweigbergk
<martinvonz@xxxxxxxxx> wrote:
>
>
>
> On Sun, Jun 30, 2024, 11:06 Pavel Rappo <pavel.rappo@xxxxxxxxx> wrote:
>>
>> Hello,
>>
>> I'm looking for a robust way to determine if a given merge commit
>> could've been produced automatically by `git merge`, without any
>> manual intervention or tampering, such as:
>>
>>   - resolving conflicts,
>>   - stopping (`--no-commit`) and modifying,
>>   - amending the commit.
>>
>> My initial idea was to re-enact the merge. If the merge failed, I
>> would conclude that the original merge couldn't have been produced
>> automatically. If the merge succeeded, I would compare it with the
>> original merge. Any differences would indicate that the original merge
>> couldn't have been produced automatically. Otherwise, I would conclude
>> that it could've been. This approach is simple, but involves multiple
>> steps and requires clean-up.
>>
>> My second idea was to use `git show --diff-merges=dense-combined`,
>> which only prints hunks that come from neither parent. If nothing is
>> printed, I would conclude that the merge could've been produced
>> automatically. This approach is simple, single-step, but seems to have
>> an issue. In my experiments, I found that if some hunks from different
>> parents were located closely enough, output was produced. So, checking
>> if nothing is output could lead to false negatives: a merge that
>> could've been produced automatically might look like it was tampered
>> with.
>>
>> My third idea was to use a recently added feature, `git show
>> --remerge-diff`, which seemingly embodies my first idea and is immune
>> to the issue of the second. It is also single-step and requires no
>> clean-up:
>>
>> > Remerge two-parent merge commits to create a temporary tree object—potentially containing files with conflict markers and such. A diff is then shown between that temporary tree and the actual merge commit.
>>
>> However, this bit means that I shouldn't entirely trust its output:
>>
>> > The output emitted when this option is used is subject to change, and so is its interaction with other options (unless explicitly documented).
>
>
> There's basically only one way to display an empty diff, so I suspect that checking that the diff is empty is still going to be enough for your purposes.
>
> Note that you can specify e.g. the rename detection threshold to use while merging, and the person doing the merge might have used a different threshold than you're using when you're trying to check if they added other changes. There are also different merge strategies and diff algorithms to choose. That means that you might get false positives and false negatives. Maybe that's still good enough for you.
>
>>
>> What is my best course of action?
>>
>> Thanks,
>> -Pavel
>>