Re: [PATCH 2/3] Introduce rename factorization in diffcore.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yann Dirson <ydirson@xxxxxxxxxx> writes:

> Rename factorization tries to group together files moving from and to
> identical directories - the most common case being directory renames.
> We do that by first identifying groups of bulk-moved files, and then
> hiding those of the individual renames which carry no other
> information (further name change, or content changes).
> This feature is activated by the new --factorize-renames diffcore
> flag.

I have a mixed feeling about this one, primarily because I cannot
visualize how a useful output should look like.  Unless you rename one
directory to another without any content changes, you would have to say
"this directory changed to that, and among the paths underneath them, this
file have this content change in addition".

A related feature that would benefit from something like your change
without any downside/complication of output format issues is to boost
rename similarity score of a path when its neighbouring paths are moved to
the same location.  E.g. when you see:

 - three files a/{1,2,3} deleted;
 - three files b/{1,2,3} created;
 - (a/1 => b/1) and (a/2 => b/2) are similar enough;
 - (a/3 => b/3) are not similar enough.

we currently detect only two renames and leave deletion of a/3 and
creation of b/3 unpaired.  You should be able to help them paired up by
noticing that the entire a/* goes away (for that, reading the full
postimage like you do in your patch helps) and boost the similarity score
between these two.

Although I do not offhand think a good format to show the information you
are trying to capture in the textual diff output, one thing that would be
helped by the grouping of renames like you do would be process_renames()
in merge_recursive.c.  This is especially so when you have added a new
path in a directory that has been moved by the other branch you are
merging.  For this usage, there is no "textual output format" issues.  It
does not even have to be expressed by replacing individual entries from
diffq with entries that represent a whole subtree --- you could for
example keep what diffq.queue records intact, and add a separate list of
directory renames as a hint for users like process_renames() to use.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux