Re: [PATCH/RFC 0/3] faster inexact rename handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 29, 2007 at 10:06:11PM -0700, Linus Torvalds wrote:

> Have you compared the results? IOW, does it find the *same* renames?

>From my limited testing, it generally finds the same pairs. However,
there are a number of renames that it _doesn't_ find, because they are
composed of "uninteresting" lines, dropping them below the minimum
score. Try (in git.git):

  git-show --raw -M -l0 :/'Big tool rename'

with the old and new code. Pairs like Documentation/git-add-script.txt
-> Documentation/git-add.txt are not found, because the file is composed
almost entirely of boilerplate.

Moving the size normalization into the similarity engine should probably
fix that, and will let us compare old and new results more accurately.
I'll try to work on that.

> I'm a bit worried about the fact that you just pick a single (arbitrary) 
> src/dst per fingerprint. Yes, it should be limited, but that seems to be a 
> bit too *extremely* limited. But if it gives the same results in practice, 
> maybe nobody cares?

Yes, I have not convinced myself yet that it's the right approach (but
it seemed like a good place to try first, for simplicity and speed).  As
I noted, this approach seems to be a bit memory hungry on large, so I am
a bit concerned about increasing the size of the fingerprint_entry
structure.  However, Andy's sampling approach might help fix that.

The current code also doesn't bother marking overflow, so common lines
get attributes to some random file (actually, worse than random: if a
bunch of files have the same common lines, _all_ of the lines will go to
the last file, which means we subtly favor renames from the end of the
input list). So probably it should be tested as-is, with an "overflow,
this line is too common to be interesting" bit, and with a small-ish
limit (I had at one point tried 5, but the implementation was naive and
too memory-hungry).

-Peff
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux