Jeff King <peff@xxxxxxxx> writes: > I considered making it a special mode of "git blame" when blame is fed a > directory instead of a file... ... If we want to go that way, > we can always catch the special case in blame and just exec blame-tree. I agree that rolling it into "blame" will be a good way to conclude this feature from the end-user UI point of view, provided if (as you said) this proves to be the most reasonable way to give a coarser, file-level blame to a history. It has been my opinion that the default github view that seems to be modeled after cvsview is the least useful choice of the default in the more modern git era ;-), but I guess people's eyes and brains are trained by the old school "file boundaries matter" way of thinking. I am not against catering to them as long as it is done in a modular way that does not negatively affect the other parts of the code, and this implementation certainly is a good one from a cursory reading, except for two points. One is merges, which you punted on with --no-merges ;-). If I were doing this, I would probably have traversed without --no-merges, and would blame all cases that are not TREESAME to the merge (i.e. regardless of the cleanness of the content level merge), as this feature is about file level blame. If the child took the whole content as-is from a parent, it is a no-event, but if the child has the resulting content that is different from any of its parents, then that child is responsible for the end-result content at the file level. Doing nothing else makes sense. There is a fundamental expectation for file level history: if you give a commit C as the result for path F, saying C was the one that touched path F the last, you are saying that F that is contained in C is exactly the same as F at the tip of the history (i.e. whereever you started digging back from). If you don't stop at a merge that had content-level merge, you would break that expectation. You would also rely on the local clock of the merged branches to say which branch made a commit touching the path the last, but it is much minor issue compared to the breakage of "result must match where we say it came from" expectation. > The initial set of interesting files we come up with is gotten by > looking at the tree of the first pending object after parsing the rev > options (defaulting to HEAD). Which sounds a little flaky to me, but > does what you want in practice. I'd be curious if somebody can come up > with a counterexample where the ability to manually specify the source > tree would be more useful. I started writing "In what sense is it flaky? What corner case do you have in mind?" in a few lines above and then moved that here ;-). The second point is "why didn't you use pathspec, but chose to take only one path?" It would be useful if you can say $ git blame v2.6.24..v2.6.37 -- net include/net no? > So this is the per-commit processing. Basically we just do a diff for > each commit, and see if each path has been claimed. Note that it > depends on the string-list item->util being initialized to zero. Hence > my recent patch to string-list, and this needs to go on top of 62b8102 > (which is in master and maint). Once you know who the guilty party is, can't you just remove that element from the list, so that later search from the "remaining path" list would decrease in cost as you go deeper? Also at some point the set of remaining path would become small and it might make sense to feed that as diff-tree pathspec. After all, aren't you re-initializing the diff machinery for each commit (which is _not_ a bad thing to do--I am just pointing out that there is an opportunity to use narrower pathspecs as you go without any additional cost)? Note that I am not suggesting to update the pathspec used for commit traversal in the middle; that is a lot trickier. >> + while (data.num_interesting > 0 && > > An optimization; we can quit the traversal as soon as everything is > claimed. In practice this helps disappointingly little. That is somewhat unexpected, but would strongly suggest that the approach to use more specific pathspec that cover only the remaining paths would give you faster per-commit diff? > My merge handling is just "which files are different from the parents". > Which is reasonable, but I don't actually exercise it since we use > --no-merges by default. :) I think that is a big mistake and I already said it above. > We could try to do something clever here about evil merges. Again, I wouldn't even think about it; this is a "file boundary matters" tool; you shouldn't care about cleanliness of content-level merges. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html