Junio C Hamano <gitster@xxxxxxxxx> writes: > Unlike the above checkbox hack, "git blame" uses a real data > structure to keep track of what came from where. Instead of a > global "this single path is what interests us now", it knows "in > this commit, this is the path we are looking at", and when it looks > at the parents of that commit, it checks where that path the child > was interested in came from each different parent, and records a > similar "in this commit (which is parent of the commit we were > looking at), this path is what we are interested in". FWIW, the above is greatly simplified. For "git blame" to correctly handle a case like "This commit created file F by taking pieces from files A, B, C, D, and E", and annotating the lines in file F, we need to keep track of the set of "lines n..m of path A", "lines l..k of path B", etc., at commit X as the targets of interest, and as we dig down the history, figure out where in the parent commits of X each of these range of lines come from. So what "blame" uses is much richer than just a single path per commit being traversed (once the traversal passes through from a commit to all of its parents, this list of "line ranges per path" can be released, so that is not a huge memory burden even for a deep history). Now "git log --follow" does not have to keep track of range of lines, but if you start following from file F that was created by concatenating pieces of multiple existing files A, B, ..., and E, you either want to pick one of these 5 and follow it, or you replace F with all five of these files and follow them from that point. In any case, you need a richer data structure than the current (ab)use of the .pathspec member during the traversal.