Re: [RFC] git blame-tree

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 02 Mar 2011 15:22:14 -0800

Jeff King <peff@xxxxxxxx> writes:

> With sensible semantics, we can turn off --no-merges and still get good
> results. And anybody who cares for something different can use
> --no-merges themselves. Though I expect the opposite to be the case; I
> can imagine somebody wanting --first-parent to blame files only to
> merges.  I'll have to make sure that works properly in my re-roll.

I still do not understand why anybody would even want to say --no-merges.

What does this even mean?

	git blame --no-merges master..pu -- builtin | builtin/push.c

It may find ab/i18n or mm/push-default-advice depending on the timestamp,
by ignoring a content-level merge but is it a useful information?

> I was thinking something like:
>
>   git blame-tree dir branch1 branch2
>
> where branch2 is actually _ahead_ of branch1. We take the file list from
> branch1, but may blame commits on branch2. It's probably a little bit of
> a crazy thing to ask for, though.

To be compatible with all the git commands, the order of parameters should
be revs first and then pathspec, i.e. "git blame-tree branch1 branch2 dir".
But that is a tangent.

Again, the same "what does that mean" applies.

>> The second point is "why didn't you use pathspec, but chose to take only
>> one path?"  It would be useful if you can say
>> 
>> 	$ git blame v2.6.24..v2.6.37 -- net include/net
>> 
>> no?
>
> ...  So in the
> traversal above, if 2.6.25 removes net/foo.c, then a regular traversal
> will think that's interesting and mention it.  But for blame-tree, we
> don't want that. We want to mention net/foo.c only if it existed at the
> beginning.

Of course, if v2.6.37 that is the endpoint of the history (i.e. "at the
beginning" above) didn't have net/foo.c, then we shouldn't show it.

But that doesn't change my question.  Why are you taking a single
directory and doing "ls-tree $endpoint $that_directory" to populate the
set of paths to find who touched them last, instead of using the args as
regular pathspecs and running "ls-tree $endpoint -- net include/net" to
populate the set?

> Of course it would be easy enough to parse the revision parameters, then
> create our initial path list by combining the initial tree and the
> pathspecs....

As I said already, it does not make any sense to have more than one
endpoint to begin with when you are asking "what is the last one that
touched things", and that also applies to the regular blame, which I think
already has a logic to make sure you have only one positive commit in the
starting rev list.

> ... When net/subdir/foo.c changes, we do care, we just say
> "net/subdir" changed.

I think you would want diff-tree with or without -r then.  I suspect
people may want to pass --recursive to this command.

> The expensive part is actually looking at the tree and doing even the
> raw diff for each commit.

... which may probably be helped by limiting the recursion of that diff
with pathspec, but I realize that you are not by default running recursive
diff, so...

It also occurs to me that it may also be a plausible implementation to
hook this into the usual revision traversal machinery.

> I'm not sure if we do that optimization. Even if we do, I'm not sure how
> much practical difference it will make. I expect most of the time is
> spent getting the tree out of storage at all, not walking the entries.

Yeah, my "pathspec" comment primarily comes from my expectation that this
would want to be recursive.  If you are only scanning a single flat tree
without recursing, then there is no need.

> And then I _will_ need to feed the proper pathspecs to diff. Because it
> won't be a matter of looking at a few extra entries in a tree we have
> already pulled from storage. We need to tell diff not to delve into
> those extra trees at all...

Yes.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html