Detecting squash-merged branches (and question about git-diff-tree)

Max Gautier <mg@xxxxxxxxxxxxxxxx> · Tue, 3 Dec 2024 14:55:44 +0100

Hi,

I tend to work on project which do a lot of "squash-merging" e.g, merge
branches by having a robot squash the branch in a new commit on top of
the main branch.

This makes it a bit hard to remove my branches when they are
"squash-merged" (in contrast to using `git branch --merged`)

I started a little script to detect such branches; initially I used git
cherry, but this only detect the case where the branch has 1 commit,
which is not enough.

Sharing below if anyone is interested and/or want to give some feedback
(warning, this is probably full of bash-ism/gnu-ism).

Which leads me to my actual question:
I wanted to use diff-tree in --stdin mode (instead of calling it
repeatedly in a loop), feeding it my target branch and
the list of relevant commits, but apparently --merge-base and --stdin
are mutually exclusive. What's the reason for that ?
I suppose it's related to the 3 possible line forms diff-tree accepts in
--stdin mode, but I didn't find a spelled out explanation in the
original thread implementing --merge-base [1].

Is there another alternative for computing the patches ids of branches
in that way ? A '%(mergebase)' token for git for-each-ref would also
work, but there is no such thing either that I know of. 

(Of course, the script as such runs ~reasonably~ well, but it does spend
95% of it's time waiting for subprocess, which bugs me a little^^)

Thanks for reading me !

[1]: https://lore.kernel.org/git/cover.1599332861.git.liu.denton@xxxxxxxxx/

---

#!/bin/bash
# $1 : target ref (in which we search for squashed branches)
# (default: upstream/HEAD)
# ${@:2} (all scripts args after the first one): git for-each-ref
# patterns for refs candidates for squash-merge detection
# (default: refs/{remotes/origin,heads}/}

declare -A commit_by_patch_ids
oldest_merge_base=${1-upstream/HEAD}
ref_patterns=${@:2}
ref_patterns=${ref_patterns:-refs/remotes/origin/ refs/heads/}

for ref in $(git for-each-ref ${ref_patterns} \
                --format='%(objectname)' \
                --no-merged=${1-upstream/HEAD} )
do
  patch_id=( $(git diff-tree -p --merge-base ${1-upstream/HEAD} $ref \
              | git patch-id --stable) )
  commit_by_patch_ids[$patch_id[0]]=$ref
  # Caveat:
  # It's possible for different commit to have the save patch-id
  # (for instance on a feature branch do: git checkout feature;git branch
  # old;git rebase main -> old and feature would probably have the same
  # patch-id, if I understand this correctly)
  # proper treatment of this would need to use array of commits by
  # patch-id, but bash does not support multidimensional arrays.

  # Check oldest commit we will need to go back to when checking if a
  # patch-id exist in the source branch.
  # This assumes that branches are not squash-merged before their fork
  # point.  This avoids going back all the way to the first commit,
  # which can be prohibitively expensive on repository with a long
  # history (e.g, linux kernel tree takes 13 minutes on a recent machine
  # for git log -p | git patch-id)
  oldest_merge_base=$(git merge-base $oldest_merge_base $ref)
done

declare -a squashed
# Extract commits whose patch-id exist in the target branch.
#
for patch_id in $(git log -p ${oldest_merge_base}..${1-upstream/HEAD} \
                 | git patch-id --stable | cut -d ' ' -f 1)
do
    if [[ -n "${commit_by_patch_ids[$patch_id]+exists}" ]];then
        squashed+=("--points-at=${commit_by_patch_ids[$patch_id]}")
    fi
done

printf "%s\n" "$(git for-each-ref $ref_patterns \
                    --format='%(refname:short)' \
                    ${squashed[@]})"

-- 
Max Gautier