Re: Can git log be made to "follow" in the same way as git blame? Why / in what way is "--follow" broken or limited?

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 26 Aug 2024 15:52:30 -0700

Junio C Hamano <gitster@xxxxxxxxx> writes:

> Unlike the above checkbox hack, "git blame" uses a real data
> structure to keep track of what came from where.  Instead of a
> global "this single path is what interests us now", it knows "in
> this commit, this is the path we are looking at", and when it looks
> at the parents of that commit, it checks where that path the child
> was interested in came from each different parent, and records a
> similar "in this commit (which is parent of the commit we were
> looking at), this path is what we are interested in".

FWIW, the above is greatly simplified.  For "git blame" to correctly
handle a case like "This commit created file F by taking pieces from
files A, B, C, D, and E", and annotating the lines in file F, we
need to keep track of the set of "lines n..m of path A", "lines l..k
of path B", etc., at commit X as the targets of interest, and as we
dig down the history, figure out where in the parent commits of X
each of these range of lines come from.  So what "blame" uses is
much richer than just a single path per commit being traversed (once
the traversal passes through from a commit to all of its parents,
this list of "line ranges per path" can be released, so that is not
a huge memory burden even for a deep history).

Now "git log --follow" does not have to keep track of range of
lines, but if you start following from file F that was created by
concatenating pieces of multiple existing files A, B, ..., and E,
you either want to pick one of these 5 and follow it, or you replace
F with all five of these files and follow them from that point.  In
any case, you need a richer data structure than the current (ab)use
of the .pathspec member during the traversal.