[PATCH] apply delta depth bias to already deltified objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We already apply a bias on the initial delta attempt with max_size being 
a function of the base object depth.  This already has the effect of 
favoring shallower deltas even if deeper deltas could be smaller, and 
therefore creating a wider delta tree (see commits 4e8da195 and 
c3b06a69).

This principle should also be applied to all delta attempts for the same 
object and not only the first attempt.  With this the criteria for the 
best delta is not only its size but also its depth, so that a shallower 
delta might be selected even if it is larger than a deeper one.  Even if 
some deltas get larger, they allow for wider delta trees making the 
depth limit less quickly reached and therefore better deltas can be 
subsequently found, keeping the resulting pack size equal or even 
smaller.  Runtime access to the pack should also benefit from shallower 
deltas.

Testing on the GIT repo before this patch provided those numbers:

$ git repack -a -f --depth=10
Total 56534 (delta 39441), reused 0 (delta 0)
29.38user 0.33system 0:29.73elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+20996minor)pagefaults 0swaps

$ git verify-pack -v
chain length = 1: 3150 objects
chain length = 2: 2713 objects
chain length = 3: 2441 objects
chain length = 4: 2205 objects
chain length = 5: 2014 objects
chain length = 6: 1895 objects
chain length = 7: 1802 objects
chain length = 8: 1997 objects
chain length = 9: 2710 objects
chain length = 10: 18514 objects

Pack size is 18273039 bytes.

With this patch:

$ git repack -a -f --depth=10
Total 56534 (delta 39370), reused 0 (delta 0)
24.32user 0.35system 0:24.78elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+19142minor)pagefaults 0swaps

$ git verify-pack -v
chain length = 1: 4059 objects
chain length = 2: 4048 objects
chain length = 3: 3775 objects
chain length = 4: 3428 objects
chain length = 5: 3007 objects
chain length = 6: 2793 objects
chain length = 7: 2701 objects
chain length = 8: 2988 objects
chain length = 9: 3935 objects
chain length = 10: 8636 objects

Pack size is 16311008 bytes.

Summary: this patch makes for slightly faster repacks, a smaller pack, 
and a much lower count of object hanging on the max delta depth.

Same repo again with the default depth of 50:

Before:
Total 56534 (delta 39545), reused 0 (delta 0)
23.01user 0.36system 0:23.40elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+21017minor)pagefaults 0swaps
chain length = 1: 2882 objects
chain length = 2: 2443 objects
chain length = 3: 2157 objects
[...]
chain length = 48: 505 objects
chain length = 49: 782 objects
chain length = 50: 1802 objects
Pack size: 14651260 bytes.

After:

Total 56534 (delta 39596), reused 0 (delta 0)
22.98user 0.20system 0:23.20elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+20560minor)pagefaults 0swaps
chain length = 1: 2985 objects
chain length = 2: 2625 objects
chain length = 3: 2282 objects
[...]
chain length = 48: 477 objects
chain length = 49: 564 objects
chain length = 50: 392 objects
Pack size: 14383165 bytes.

With the Linux kernel repo:

Before:

Total 531582 (delta 431824), reused 0 (delta 0)
276.66user 3.64system 4:45.35elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+149606minor)pagefaults 0swaps
chain length = 1: 40779 objects
chain length = 2: 35650 objects
chain length = 3: 30226 objects
chain length = 4: 25619 objects
chain length = 5: 21709 objects
[...]
chain length = 46: 2223 objects
chain length = 47: 3263 objects
chain length = 48: 3175 objects
chain length = 49: 2128 objects
chain length = 50: 6413 objects
Pack size: 155680720 bytes.

After:

Total 531582 (delta 432537), reused 0 (delta 0)
270.31user 3.15system 4:35.07elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+140517minor)pagefaults 0swaps
chain length = 1: 41555 objects
chain length = 2: 36919 objects
chain length = 3: 31217 objects
chain length = 4: 26762 objects
chain length = 5: 22348 objects
[...]
chain length = 46: 1842 objects
chain length = 47: 1800 objects
chain length = 48: 1483 objects
chain length = 49: 1216 objects
chain length = 50: 1310 objects
Pack size: 154421656 bytes.


Signed-off-by: Nicolas Pitre <nico@xxxxxxx>
---

This apparently makes BRian's patological case worse (although better 
than before his same-size-shallower patch), but I think that the 
improvement in the general case is worth it.  Even Brian's pack gets 
smaller so...

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 54b9d26..1eb86ed 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1332,12 +1332,14 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 
 	/* Now some size filtering heuristics. */
 	trg_size = trg_entry->size;
-	max_size = trg_size/2 - 20;
+	if (!trg_entry->delta)
+		max_size = trg_size/2 - 20;
+	else
+		max_size = trg_entry->delta_size * max_depth /
+				(max_depth - trg_entry->depth + 1);
 	max_size = max_size * (max_depth - src_entry->depth) / max_depth;
 	if (max_size == 0)
 		return 0;
-	if (trg_entry->delta && trg_entry->delta_size <= max_size)
-		max_size = trg_entry->delta_size;
 	src_size = src_entry->size;
 	sizediff = src_size < trg_size ? trg_size - src_size : 0;
 	if (sizediff >= max_size)
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux