Re: How to merge by subtree while preserving history?

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 27 Mar 2009 10:20:36 -0700

David Reitter <david.reitter@xxxxxxxxx> writes:

> ...  Is there a command that gives me
> the diff  for a revision pair, restricted to what happened to content
> in a given file in the current tree?

You can get a half of it from blame (and I presume the other half by
running the procedure in reverse).

"git blame" has an obscure switch -S that lets you lie about the ancestry
by allowing you to install a graft (this is primarily used by the annotate
operation of git-cvsserver).

Suppose you have revisions A and B, and a lot of code in a file F in the
original revision A migrated to many other places in a later revision B
over time.  You want to see where each and every line in F from A ended up
in B.

To compute this, you pretend as if the history originates at B (i.e. B is
the root commit), and A is a direct descendant of it, and blame each and
every line of F in A, with a very agressive setting.  E.g.

	{
		echo $(git rev-parse A) $(git rev-parse B)
                echo $(git rev-parse B)
	} >tmp-graft
        git blame -C -C -C -w -S tmp-graft A -- F

I'll leave it as an exercise to the readers how to compute "where did each
and every line in G in B came from A?"

Note that in order for this to work, it needs a fix to "blame -S" that I
posted about 10 days ago: aa9ea77 (blame: read custom grafts given by -S
before calling setup_revisions(), 2009-03-18); the fix is sitting in 'pu',
because as far as I know nobody has cared about the breakage other than I,
at least until now.

I've attached a script that uses this trick to compute "How much of what
Linus originally wrote still survives."  People who attended GitTogether'08
may have seen the result.

---

#!/bin/sh
# How much of the very original version from Linus survive?

_x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
_x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40"

initial=$(git rev-parse --verify e83c5163316f89bfbde7d9ab23ca2e25604af290) &&
this=$(git rev-parse --verify ${1-HEAD}^0) || exit

tmp="/var/tmp/Linus.$$"
trap 'rm -f "$tmp".*' 0

# We blame each file in the initial revision pretending as if it is a
# direct descendant of the given version, and also pretend that the
# latter is a root commit.  This way, lines in the initial revision
# that survived to the other version can be identified (they will be
# attributed to the other version).
graft="$tmp.graft" &&
{
	echo "$initial $this"
	echo "$this"
} >"$graft" || exit

opts='-C -C -C -w'

git ls-tree -r "$initial" |
while read mode type sha1 name
do
	git blame $opts --porcelain -S "$graft" "$initial" -- "$name" |
	sed -ne "s/^\($_x40\) .*/\1/p" |
	sort |
	uniq -c | {
		# There are only two commits in the fake history, so
		# there won't be at most two output from the above.
		read cnt1 commit1
		read cnt2 commit2
		if test -z "$commit2"
		then
			cnt2=0
		fi
		if test "$initial" != "$commit1"
		then
			cnt_surviving=$cnt1
		else
			cnt_surviving=$cnt2
		fi
		cnt_total=$(( $cnt1 + $cnt2 ))
		echo "$cnt_surviving $cnt_total	$name"
	}
done | {
	total=0
	surviving=0
	while read s t n
	do
		total=$(( $total + $t )) surviving=$(( $surviving + $s ))
		printf "%6d / %-6d	%s\n" $s $t $n
	done
	printf "%6d / %-6d	%s\n" $surviving $total Total
}

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html