[PATCH 2/2] diff: don't retrieve binary blobs for diffstat

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We only need the size, which is much cheaper to get,
especially if it is a big binary file.

Signed-off-by: Jeff King <peff@xxxxxxxx>
---
This of course is only really helpful if you have marked the files as
binary via gitattributes, since otherwise we have to pull in the blob to
find out that it's binary. :)

But in my real-world photo/video repo, which has media files marked via
gitattributes as binary (but to textconv exif tags, of course). The
commit in question has 26 files totalling 88 megabytes.

  $ time git show --stat >old
  real    0m0.428s
  user    0m0.392s
  sys     0m0.032s

  $ time git.jk.diffstat-binary show --stat >new
  real    0m0.005s
  user    0m0.004s
  sys     0m0.000s

  $ cmp old new && echo ok
  ok

8500% speedup isn't too bad. :)

 diff.c |   15 +++++++++++----
 1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index 2ac0fe9..6640857 100644
--- a/diff.c
+++ b/diff.c
@@ -245,6 +245,15 @@ static int fill_mmfile(mmfile_t *mf, struct diff_filespec *one)
 	return 0;
 }
 
+/* like fill_mmfile, but only for size, so we can avoid retrieving blob */
+static unsigned long diff_filespec_size(struct diff_filespec *one)
+{
+	if (!DIFF_FILE_VALID(one))
+		return 0;
+	diff_populate_filespec(one, 1);
+	return one->size;
+}
+
 static int count_trailing_blank(mmfile_t *mf, unsigned ws_rule)
 {
 	char *ptr = mf->ptr;
@@ -2081,11 +2090,9 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 	}
 
 	if (diff_filespec_is_binary(one) || diff_filespec_is_binary(two)) {
-		if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
-			die("unable to read files to diff");
 		data->is_binary = 1;
-		data->added = mf2.size;
-		data->deleted = mf1.size;
+		data->added = diff_filespec_size(two);
+		data->deleted = diff_filespec_size(one);
 	}
 
 	else if (complete_rewrite) {
-- 
1.7.4.1.26.g3372c
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]