On Sun, Feb 05, 2023 at 09:47:03PM +0100, Ævar Arnfjörð Bjarmason wrote: > From: Jeff King <peff@xxxxxxxx> I appreciate being credited, of course, but at some point I think this becomes "Based-on-a-patch-by". And we may have crossed the line here. > The "blame-tree" library allows for finding the most recent > modification to paths in a tree. It does so by expanding the tree at a > given commit, taking note of the current state of each path, and then > walking backwards through history looking for commits where each path > changed into its final sha1. > > The "git blame-tree" command was first noted on the ML 2011[1], and > over the years there have been mentions of it[2][3], and whether it > could be upstreamed. The sources have been available at [4]'s > "jk/blame-tree-wip" and "jk/faster-blame-tree-wip" branches. Sort of. jk/blame-tree-wip probably matches what I sent to the list in 2011 (and that probably matches what was deployed at GitHub at the time). And like all of my branches, I continually rebase it on git.git's master branch. But like all branches in my repo with "-wip" in them, it is not part of my daily driver, and I generally do not even build it (and I would not be surprised if it does not build at all, or one of those rebases introduced a horrible bug). The jk/faster-blame-tree-wip was an experiment from 2014 to narrow the pathspec as we found answers. But it was never deployed anywhere, and likewise may or may not even build now. I think the general idea there is sound (make pathspec lookups faster by using a trie, and then narrow it), but I suspect it's not a full solution. In particular, I don't think it does anything clever with merges. Since we are narrowing the pathspec as we traverse, we can't rely on the usual pruning of side branches that happens via limit_list(), as that is all up-front. So it probably needs to look at each merge as we traverse and cull parents based on TREESAME. I believe GitHub later had patches to do that, but they didn't use the pathspec machinery (because without the tries, it becomes accidentally quadratic in the number of paths). I didn't work on those patches, though (Stolee and Taylor did), and they're not anywhere in my repository (and I no longer have access to the private github repo). > This change attempts to start upstreaming this command, but rather > than adding a command whose interface & semantics may be controversial > starts by exposing & testing the core of its library through a new > test helper. > > An eventual "git blame-tree" command, or e.g. a new format for "git > ls-tree" to show what a path "blames" to can then be implemented with > this library. OK. It's a little weird, as I do think the interface and semantics are the more interesting part, but this certainly isn't hurting anybody to go this route. > * Removing the "--max-depth" changes to the diff code. We'll need > those eventually, but it's not required for a blame of a given list > of paths. > > As has been noted in previous on-list discussions the semantics of > the "max-depth" changes might be controversial, so it's worthwhile > to split those out so that they can be reviewed separately. That's probably reasonable. The only two interesting "depths" are really "recurse" and "don't recurse" (where "recurse" is probably what you'd put in a user-facing tool, and "don't recurse" is what a site like GitHub uses to do the blame for a single level of tree it's showing). And that narrows the problem space quite a bit. > * Made the "blame-tree" helper take "--" before any revision options, > for clarity. An eventual built-in command (if any) probably doesn't > want to enforce this, but it makes it clearer in the test helper > what's an argument for "blame-tree" itself, and what's an argument for > the revision machinery. OK. Since this is just a test-helper, we don't care too much either way. > * Minor updates for using C99 syntax, and "size_t" instead of "int" > when we're iterating over types whose "nr" is that size. Reasonable. > * Avoid sub-shelling in the tests, use "test-tool -C .." instead. Yeah, the original code probably predates "-C". ;) > The range-diff here is to peff's jk/blame-tree-wip. As noted above > this is far from the full thing, but hopefully getting the basic bits > of the library (sans the max-depth question) will make the review of > subsequent bits easier. I doubt this range-diff is useful to anybody. I'm probably the person most likely to make sense of it, and it means nothing to me. It probably makes sense conceptually to just treat this as a new topic that happens to be based on older work. But if you did want to base it on something, you probably ought to do so on what GitHub is currently running in production (and again, talk to Taylor or Stolee for that). Unless the intent is to use this as a base for showing their changes. Though even then, I'm not sure the intermediate state is all that interesting. > [oodles of patch] TBH, I didn't really look at this closely. It's been a decade, so even for me reviewing this would basically be looking at it from scratch. And as my Git time is a bit limited these days, I can't really promise timely review of a big topic. -Peff