Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds <torvalds@xxxxxxxx> writes:

> ...  We're starting to see 
> git actually being able to track file content moving between files: even 
> when the files themselves didn't move (ie Junio's "git pickaxe" work could 
> do things like that).

I've reordered the git-pickaxe I parked in "pu" while 1.4.3-rc
cycle and merged it into "next".

The earlier one I was futzing with in "pu" had built-in
heuristics and pure mechanisms mixed together in the same patch,
which was quite bad as development history.  I think the
reordered sequence shows the logical evolution better.

  1. git-pickaxe: blame rewritten.

     This implements the infrastructure (parent traversal,
     identifying "corresponding path" in the parent -- aka
     "handling renames", passing blames to the parents and
     taking responsibility for the remainder) and uses the the
     same old "single diff with parent file identifies what we
     inherited from the parent" logic git-blame uses for passing
     blames.

  2. git-pickaxe -M: blame line movements within a file.

     This adds logic to find swapped groups of lines in the same
     file.  When the file in the parent had A and B and the child
     has B and A, "single diff with parent" would find only one
     of A or B is inherited from the parent, not both.  This
     re-diffs the remainder with the parent's file to find both.

     I used to have heuristics to avoid trivial groups of lines
     from being subject to this step, but in this version they
     have been removed, so that we can see the core logic and
     need for heuristics more clearly.

     On the other hand, the version I used to have in "pu" gave
     blame to the first match.  This one tries to find the best
     match and assign the blame to it.

  3. git-pickaxe -C: blame cut-and-pasted lines.

     This adds logic to find groups of lines brought in from
     existing file in the parent.  We scan the remainder using
     the same logic as -M detection, but it is done against
     other files in the parent.

     There was a heuristic that gave the blame to the parent
     right then and there when we find a copy-and-paste instead
     of allowing the parent to pass blame further on to its
     ancestors; again I removed this heuristics in the reordered
     series.

The next logical step is to come up with a good set of
heuristics to avoid excessive nonsense matches the code
currently gives.

Groups of small number of empty lines, lines with indentation
blanks followed by a closing brace, and '#include' lines that
include common header files occur so commonly, that without any
heuristics (which can be seen in the "next" branch today) the
algorithm would give surprisingly idiotic results.  For example:

	git -p pickaxe -C -f -n v1.4.3 -- commit.c

tells you that the first line of commit.c in v1.4.3 release,
which is '#include "cache.h"' came from the first line of
receive-pack.c which is total nonsense (this particular line
could actually be a bug in the -M or -C logic -- I need to
check).

A less "obviously wrong" but still idiotic case is that we find
ll.409-411 came from ll.94-96 of describe.c in commit 908e5310.
These three lines read as:

	409		}
        410	}
        411

While this blame assignment might be technically correct, it
does not add much value to pass blames on in such a case.

On the brighter side, we find that ll.415-419 (the beginning of
function "static int get_one_line()") originally came from
diff-tree.c (commit cee99d22, ll.275-279).

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]