[PATCH v2 1/4] diffcore-rename: compute basenames of all source and dest candidates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Elijah Newren <newren@xxxxxxxxx>

We want to make use of unique basenames to help inform rename detection,
so that more likely pairings can be checked first.  (src/moduleA/foo.txt
and source/module/A/foo.txt are likely related if there are no other
'foo.txt' files among the deleted and added files.)  Add a new function,
not yet used, which creates a map of the unique basenames within
rename_src and another within rename_dst, together with the indices
within rename_src/rename_dst where those basenames show up.  Non-unique
basenames still show up in the map, but have an invalid index (-1).

This function was inspired by the fact that in real world repositories,
most renames often do not involve a basename change.  Here are some
sample repositories and the percentage of their historical renames (as of
early 2020) that did not involve a basename change:
  * linux: 76%
  * gcc: 64%
  * gecko: 79%
  * webkit: 89%

Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
---
 diffcore-rename.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 74930716e70d..1c52077b04e5 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -367,6 +367,59 @@ static int find_exact_renames(struct diff_options *options)
 	return renames;
 }
 
+MAYBE_UNUSED
+static int find_basename_matches(struct diff_options *options,
+				 int minimum_score,
+				 int num_src)
+{
+	int i;
+	struct strintmap sources;
+	struct strintmap dests;
+
+	/* Create maps of basename -> fullname(s) for sources and dests */
+	strintmap_init_with_options(&sources, -1, NULL, 0);
+	strintmap_init_with_options(&dests, -1, NULL, 0);
+	for (i = 0; i < num_src; ++i) {
+		char *filename = rename_src[i].p->one->path;
+		char *base;
+
+		/* exact renames removed in remove_unneeded_paths_from_src() */
+		assert(!rename_src[i].p->one->rename_used);
+
+		base = strrchr(filename, '/');
+		base = (base ? base+1 : filename);
+
+		/* Record index within rename_src (i) if basename is unique */
+		if (strintmap_contains(&sources, base))
+			strintmap_set(&sources, base, -1);
+		else
+			strintmap_set(&sources, base, i);
+	}
+	for (i = 0; i < rename_dst_nr; ++i) {
+		char *filename = rename_dst[i].p->two->path;
+		char *base;
+
+		if (rename_dst[i].is_rename)
+			continue; /* involved in exact match already. */
+
+		base = strrchr(filename, '/');
+		base = (base ? base+1 : filename);
+
+		/* Record index within rename_dst (i) if basename is unique */
+		if (strintmap_contains(&dests, base))
+			strintmap_set(&dests, base, -1);
+		else
+			strintmap_set(&dests, base, i);
+	}
+
+	/* TODO: Make use of basenames source and destination basenames */
+
+	strintmap_clear(&sources);
+	strintmap_clear(&dests);
+
+	return 0;
+}
+
 #define NUM_CANDIDATE_PER_DST 4
 static void record_if_better(struct diff_score m[], struct diff_score *o)
 {
-- 
gitgitgadget




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux