Re: [PATCH v3 3/5] diffcore-rename: complete find_basename_matches()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Elijah Newren <newren@xxxxxxxxx> writes:

> I can change that.  I can also simplify it further to
>
>         if (0 <= (dst_index = (strintmap_get(&dests, base)))) {
>
> since dests uses a default value of -1.  That will decrease the number
> of strmap lookups here from 2 to 1.

Which would be a real win, unlike what I said in the message you are
responding to.

>> It feels incompatible with the spirit of these two steps aim for
>> (i.e. only use this optimization on a pair of src/dst with UNIQUE
>> basenames).  For the purpose of "we only handle unique ones", the
>> paths that already have matched should participate in deciding if
>> the files that survived "exact" phase have unique basename among
>> the original inpu?
>
> Yeah, I should have been more careful with my wording.  Stated a
> different way, what confidence can we associate with an exact rename?

Suppose you start with a/Makefile, b/Makefile and c/Makefile and
then they all disappeared while a1/Makefile, b1/Makefile and
c1/Makefile now are in the tree.  The contents a/Makefile used to
have appears without any difference in a1/Makefile, the same for b
and b1, but c/Makefile and c1/Makefile are different.  The c vs c1
pair may worth investigating, so it goes through the "same basename"
phase.

Now, in a slightly different situation, a vs a1 are still identical,
but b vs b1 have only one blank line removal but without any other
change.  It looks odd that such a change has to pessimize c vs c1
optimization opportunity, but an interesting part of the story is
that we can only say "such a change", not "such a miniscule change",
because we have just finished the "exact" phase, and we do not know
how big a difference b vs b1 pair actually had.

That makes me feel that this whole "we must treat unique one that
remains specially" is being incoherent.  If "because we have only
small number of removed and added Makefiles spread across the trees,
first full-matrix matching among them without anything else with
higher bar may be worth an optimization" were the optimization, then
I would understand and support the design to omit those that have
already been matched in the "exact" phase.

But IIRC, limiting this "same basename" phase to unique add/del pair
was sold as a way to make it less likely for the heuristics to make
mistakes, yet the definition of "unique", as shown above, is not all
that solid.  That I find it rather unsatisfactory.

In other words, it is not "what confidence do we have in exact
phase?"  "exact" matching may have found perfect matching pair.  But
the found pair should be happy just between themselves, and should
not have undue effect on how _other_ pairs are compared.  Stopping
the "exact" pair from participating in the "uniqueness" definition
is placing "exact" phase too much weight to affect how other filepairs
are found.

> By the exact same argument, you
> could take this a step further and say that we should calculate the
> basenames of *all* files in the tree, not just add/delete pairs, and
> only match up the ones via basename that are *truly* unique.  After
> all, break detection exists, so perhaps we don't have full confidence
> that files with an unchanged fullname are actually related.

Sorry, but you are not making sense.  These optimizations are done
only when we are not using copies and breaks, no?  What _other_
changes that kept the paths the same, or modified in place, have any
effect on matching added and deleted pairs?



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux