[PATCH 2/2] diffcore-break: save cnt_data for other phases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The "break" phase works by counting changes between two
blobs with the same path. We do this by splitting the file
into chunks (or lines for text oriented files) and then
keeping a count of chunk hashes.

The "rename" phase counts changes between blobs at two
different paths. However, it uses the exact same set of
chunk hashes (which are immutable for a given sha1).

The rename phase can therefore use the same hash data as
break. Unfortunately, we were throwing this data away after
computing it in the break phase. This patch instead attaches
it to the filespec and lets it live through the rename
phase, working under the assumption that most of the time
that breaks are being computed, renames will be too.

We only do this optimization for files which have actually
been broken, as those ones will be candidates for rename
detection (and it is a time-space tradeoff, so we don't want
to waste space keeping useless data).

Signed-off-by: Jeff King <peff@xxxxxxxx>
---
Note the two assumptions above:

  1. If we are running break, we will probably run rename detection.

  2. We are not looking for inexact copies, which could re-use data even
     for non-broken pairs.

We could test both of those assumptions by peeking at the other
diff_options that have been set up, but that would involve some code
restructuring. I'm also not sure it is worth it.

For (1), if we don't do rename detection, the next thing we will do is
the diffcore merge, which will then free the data anyway. So we are
really just carrying around the extra cnt_data through the break, and it
is much smaller than the actual blob data.

For (2), keep in mind that this is just a heuristic. We may not have any
good rename candidates anyway, or we may find an exact rename. So it is
just a guess as to which data might be useful. Keeping all the data for
all pairs, broken or not, may be pushing us too far along the time-space
tradeoff.

 diffcore-break.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/diffcore-break.c b/diffcore-break.c
index 15562e4..3a7b60a 100644
--- a/diffcore-break.c
+++ b/diffcore-break.c
@@ -69,7 +69,7 @@ static int should_break(struct diff_filespec *src,
 		return 0; /* we do not break too small filepair */
 
 	if (diffcore_count_changes(src, dst,
-				   NULL, NULL,
+				   &src->cnt_data, &dst->cnt_data,
 				   0,
 				   &src_copied, &literal_added))
 		return 0;
@@ -204,8 +204,8 @@ void diffcore_break(int break_score)
 				dp->score = score;
 				dp->broken_pair = 1;
 
-				diff_free_filespec_data(p->one);
-				diff_free_filespec_data(p->two);
+				diff_free_filespec_blob(p->one);
+				diff_free_filespec_blob(p->two);
 				free(p); /* not diff_free_filepair(), we are
 					  * reusing one and two here.
 					  */
-- 
1.6.5.2.187.g29317
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]