Re: [PATCH 3/3] diffcore-rename: guide inexact rename detection based on basenames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Elijah Newren <newren@xxxxxxxxx> writes:

> idea is still possible.  For example, A.txt could have been compared
> to source/some-module/A.txt.  And I don't do anything in the final
> "full matrix" stage to avoid re-comparing those two files again.
> However, it is worth noting that A.txt will have been compared to at
> most one other file, not N files.

Sorry, but where does this "at most one other file" come from?  "It
is rare to remove source/some-other-module/A.txt at the same time
while the above is happening"?  If so, yes, that sounds like a
sensible thing.

> 1) The most expensive comparison is the first one,...

Yes. we keep the spanhash table across comparison.

> 2) This would only save us from at most N comparisons in the N x M
> matrix (since no file in this optimization is compared to more than
> one other)

True, but doesn't rename_src[] and rename_dst[] entries have the
original pathname, where you can see A.txt and some-module/A.txt
share the same filename part cheaply?  Is that more expensive than
comparing spanhash tables?

Having asked these, I do think it is not worth pursuing, especially
because I agree with Derrick that this "we see a new file whose name
is the same as the one deleted from a different directory, so if
they are similar enough, let's declare victory and not bother
finding a better match" needs to be used with higher similarity bar
than the normal one.  If -M60 says "only consider pairs that are
with at least 60% similarity index", finding one at 60% similarity
and stopping at it only because the pair looks to move a file from
one directory to another directory while retaining the same name,
rejecting other paring, feels a bit too crude a heuristics.  And if
we require higher similarity levels to short-circuit, the later full
matrix stage won't be helped with "we must have already rejected"
logic.  A.txt and some-module/A.txt may not have been similar enough
to short-circuit and reject others in the earlier part, but the
full-matrix part work at a lower bar, which may consider the pair
good enough to keep as match candidates.

Thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux