[PATCHv2 2/3] --dirstat-by-file: Make it faster and more correct

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Currently, when using --dirstat-by-file, it first does the full --dirstat
analysis (using diffcore_count_changes()), and then resets 'damage' to 1,
if any damage was found by diffcore_count_changes().

But --dirstat-by-file is not interested in the file damage per se. It only
cares if the file changed at all. In that sense it only cares if the blob
SHA1 for a file has changed. We therefore only need to compare the SHA1s
of each file pair in the diff queue. As a result, we can skip the entire
--dirstat analysis and simply set 'damage' to 1 for each entry where the
SHA1 has changed.

This makes --dirstat-by-file faster, and also bypasses --dirstat's practice
of ignoring rearranged lines within a file.

The patch also contains an added testcase verifying that --dirstat-by-file
now detects changes that only rearrange lines within a file.

Signed-off-by: Johan Herland <johan@xxxxxxxxxxx>
---
 diff.c                                             |   25 ++++++++++++++++----
 t/t4013-diff-various.sh                            |    2 +
 .../diff.diff_--dirstat-by-file_initial_rearrange  |    3 ++
 3 files changed, 25 insertions(+), 5 deletions(-)
 create mode 100644 t/t4013/diff.diff_--dirstat-by-file_initial_rearrange

diff --git a/diff.c b/diff.c
index 9fa8410..a224048 100644
--- a/diff.c
+++ b/diff.c
@@ -1538,9 +1538,27 @@ static void show_dirstat(struct diff_options *options)
 		struct diff_filepair *p = q->queue[i];
 		const char *name;
 		unsigned long copied, added, damage;
+		int content_changed;
 
 		name = p->one->path ? p->one->path : p->two->path;
 
+		if (p->one->sha1_valid && p->two->sha1_valid)
+			content_changed = hashcmp(p->one->sha1, p->two->sha1);
+		else
+			content_changed = 1;
+
+		if (DIFF_OPT_TST(options, DIRSTAT_BY_FILE)) {
+			/*
+			 * In --dirstat-by-file mode, we don't really need to
+			 * look at the actual file contents at all.
+			 * The fact that the SHA1 changed is enough for us to
+			 * add this file to the list of results
+			 * (with each file contributing equal damage).
+			 */
+			damage = content_changed ? 1 : 0;
+			goto found_damage;
+		}
+
 		if (DIFF_FILE_VALID(p->one) && DIFF_FILE_VALID(p->two)) {
 			diff_populate_filespec(p->one, 0);
 			diff_populate_filespec(p->two, 0);
@@ -1563,14 +1581,11 @@ static void show_dirstat(struct diff_options *options)
 		/*
 		 * Original minus copied is the removed material,
 		 * added is the new material.  They are both damages
-		 * made to the preimage. In --dirstat-by-file mode, count
-		 * damaged files, not damaged lines. This is done by
-		 * counting only a single damaged line per file.
+		 * made to the preimage.
 		 */
 		damage = (p->one->size - copied) + added;
-		if (DIFF_OPT_TST(options, DIRSTAT_BY_FILE) && damage > 0)
-			damage = 1;
 
+found_damage:
 		ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc);
 		dir.files[dir.nr].name = name;
 		dir.files[dir.nr].changed = damage;
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 3b1b392..6428a90 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -302,6 +302,8 @@ diff master master^ side
 diff --dirstat master~1 master~2
 # --dirstat doesn't notice changes that simply rearrange existing lines
 diff --dirstat initial rearrange
+# ...but --dirstat-by-file does notice changes that only rearrange lines
+diff --dirstat-by-file initial rearrange
 EOF
 
 test_expect_success 'log -S requires an argument' '
diff --git a/t/t4013/diff.diff_--dirstat-by-file_initial_rearrange b/t/t4013/diff.diff_--dirstat-by-file_initial_rearrange
new file mode 100644
index 0000000..e48e33f
--- /dev/null
+++ b/t/t4013/diff.diff_--dirstat-by-file_initial_rearrange
@@ -0,0 +1,3 @@
+$ git diff --dirstat-by-file initial rearrange
+ 100.0% dir/
+$
-- 
1.7.5.rc1.3.g4d7b

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]