git-blame and finding previous version of a line

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> wrote
in "What's cooking in git.git (Feb 2009, #06; Wed, 18)"

> * jc/blame (Wed Jun 4 22:58:40 2008 -0700) 2 commits
>  + blame: show "previous" information in --porcelain/--incremental
>    format
>  + git-blame: refactor code to emit "porcelain format" output
>
> This gives Porcelains (like gitweb) the information on the commit
> _before_ the one that the final blame is laid on, which should save
> them one rev-parse to dig further.  The line number in the "previous"
> information may need refining, and sanity checking code for reference
> counting may need to be resurrected before this can move forward.
>
> I thought recent tig discussion may blow new life into it, but is
> this unneeded?  If so I'd rather revert it (or discard after 1.6.2).

The commit message for second patch in this series has the following:

   blame: show "previous" information in --porcelain/--incremental format
    
   When the final blame is laid for a line to a <commit, path> pair, it also
   gives a "previous" information to --porcelain and --incremental output
   format.  It gives the parent commit of the blamed commit, _and_ a path in
   that parent commit that corresponds to the blamed path --- in short, it is
   the origin that would have been blamed (or passed blame through) for the
   line _if_ the blamed commit did not change that line.

(The patch itself doesn't include update to the documentation.)  This
I guess mean that --porcelain and --incremental output have additional
header:

   "previous" <sha-1 of parent> <whitespace-quoted-filename>

I also guess that it is a merge commit that got blamed (because it was
evil merge, otherwise one of parents or its descendants would get the
blame) we would get two or more "previous" info lines, in the order of
ordering of parents.


I assume that filename in "previous" info can differ from filename in
blamed commit only wrt. wholesame filename detection, and does not do
detection of code fragment movements by itself... or does it?

This info would be even more helpful for gitweb that I thought because
of 'filename' part; we can simply relax refname restrictions and use
<blamed commit>^ or <blamed commit>^<n> for 'hb' parameter, but filename
gives some troubles (although it should happen rarely). Well, in one of
solutions I thought of there was intermediate step where gitweb resolved
<ref>^ to <sha1>, and did HTTP redirection; in this solution there is
a place where gitweb can find previous filename (filename in <rev>^,
given filename in <rev>), but it would be a mess.


Luben Tuikov in 244a70e6 (Blame "linenr" link jumps to previous state
at "orig_lineno") made gitweb link to previous version of a file (using
always first parent), for better data mining, or in other words to be
able to follow history of a given line.  Current code makes a few
assumptions:
 * we are always interested in first parent; this matters only for
   'evil merges', it the merge commit itself was blamed, which should
   be fairly rare case
 * the name of a file is the same in parent as in blamed commit; we
   would have to run git-diff-tree to check it without proposed
   "previous" header in blame output, all for rare case of file rename,
   or complicate a bit resolving filename after clicking link
 * previous version of given line is at the same position in a file
   in a parent; or at least it is close

It is the last assumption that is, I think, hardest to correct.

What algorithm do you propose to find previous version of a line? It is
not a question with definitive answer, I think, so some heuristic would
be required. Previous version of a line might not even exists! (in that
case we would probably want to be in the place it is inserted). 
Fortunately this is a situation where approximation is good enough.


(I don't know if git-blame has access to textual diff, or at least
information in chunk headers when calculating blame information, so
I don't know if the following algorithm is feasible.)

I propose the following algorithm:
 * find a hunk in textual diff which postimage contains current
   version of a line: searching hunk headers for line number should
   be enough here
 * get line numbers for corresponding preimage (I'm not sure if this
   algorithm wouldn't fail here if code movement detection is enabled)
 * either find most similar line in preimage, or calculate (perhaps
   with linear interpolation) where given line number in postimage
   line range corresponds to in preimage line range

What do you think about this algorithm? Is it good enough?


P.S. I think that going to the blamed commit version might be also
     interesting: you can check how the neighbourhood if given line
     changed, isn't it?
-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux