Re: On blame/pickaxe

Junio C Hamano <junkio@xxxxxxx> · Sun, 15 Oct 2006 23:43:55 -0700

Petr Baudis <pasky@xxxxxxx> writes:

>   Thanks for the nice writeup!
>
> Dear diary, on Fri, Oct 13, 2006 at 03:43:46AM CEST, I got a letter
> where Junio C Hamano <junkio@xxxxxxx> said that...
>> When done with one parent, if you are a merge, you will then try
>> to pass the blame on the remaining part that you are still
>> suspected for to other parents.
>
>   (This got me nervous but I guess it actually makes sense - if only one
> parent modified a line, it's right to pass the blame up; else if you
> took one parent's version verbatim, it's right to pass the blame up as
> well (I think); else you've already got the blame assigned to the merge
> commit and everything is all right.)

Well, this part is the classic blame algorithm.  The beauty of
it is that a merge case falls out as a natural consequence of
the one-parent case and you do not have to do anything special.
You either inherited a line from your parent (or one or more of
your parents) or you created the line yourself.  If you subtract
what you can blame your parents for, the remainder is what you
introduced.  The number of parents you have does not have any
effect on that logic.

>   Now, this is very nifty and so for moving functions around, but I
> think it is very dangerous for anything where ordering matters - large
> arrays definitions, patch series files, etc. In that case, you've
> completely ommitted the fact that the movement occured, which can be
> crucial and based on the current behaviour of the tools, I think people
> expect this now. To put it shortly, "who wrote this line" vs "who put
> this line here".

That's exactly why we have -f and -n options, so that the
program that reads from blame output can tell where things came
from.  It is not about "who wrote it" vs "who put it here";
pickaxe gives a lot more than that: "where did this originally
come from", i.e. "by whom in which file at what line number was
the line created".

If the user is not prepared to see code movement, pickaxe can be
run without -M nor -C to get the classic blame output.

By the way, We would not want to do the code movement (or
copying from unrelated file) for very trivial lines.  E.g. you
would not want to blame the following three lines:

        +	}
        +}
        +#endif

to a random file that happens to have the exact copy of the
above that is not related at all.  Something like the above can
happen almost anywhere.  The current implementation of -M/-C
does not do this very well.  find_copy_in_blob() currently
passes blame to the parent when it finds nontrivial copies, but
instead it should inspect the patch and return a score, and the
caller should take the parent with the best match and assign
blame to it.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html