Re: Shallow clone

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sergey Vlasov <vsu@xxxxxxxxxxx> writes:

> This is due to optimization in builtin-pack-objects.c:try_delta():
>
> 	/*
> 	 * We do not bother to try a delta that we discarded
> 	 * on an earlier try, but only when reusing delta data.
> 	 */
> 	if (!no_reuse_delta && trg_entry->in_pack &&
> 	    trg_entry->in_pack == src_entry->in_pack)
> 		return 0;
>
> After removing this part the shallow pack after clone is 2.6M, as it
> should be.
>
> The problem with this optimization is that it is only valid if we are
> repacking either the same set of objects as we did earlier, or its
> superset.  But if we are packing a subset of objects, there will be some
> objects in that subset which were delta-compressed in the original pack,
> but base objects for that deltas are not included in our subset -
> therefore we will be unable to reuse existing deltas, and with that
> optimization we will never try to use delta compression for these
> objects.
> ...
> So any partial fetch (shallow or not) from a mostly packed repository
> currently results in a suboptimal pack.

That is correct.  How about something like this?

I think the determination of "repacking_superset" may need to be
tweaked because existing packs may have overlaps, and the patch
counts them once per pack.


diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 69e5dd3..fb25124 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -64,6 +64,7 @@ struct object_entry {
 static unsigned char object_list_sha1[20];
 static int non_empty;
 static int no_reuse_delta;
+static int repacking_superset;
 static int local;
 static int incremental;
 static int allow_ofs_delta;
@@ -1172,10 +1173,13 @@ static int try_delta(struct unpacked *tr
 		return -1;
 
 	/*
-	 * We do not bother to try a delta that we discarded
-	 * on an earlier try, but only when reusing delta data.
+	 * When we are packing the superset of objects we have already
+	 * packed, we do not bother to try a delta that we discarded
+	 * on an earlier try.  This heuristic of course should not
+	 * kick in when we are not reusing delta, or we know we are
+	 * sending a subset of objects from a repository.
 	 */
-	if (!no_reuse_delta && trg_entry->in_pack &&
+	if (!no_reuse_delta && repacking_superset && trg_entry->in_pack &&
 	    trg_entry->in_pack == src_entry->in_pack)
 		return 0;
 
@@ -1493,6 +1497,16 @@ static void get_object_list(int ac, cons
 	traverse_commit_list(&revs, show_commit, show_object);
 }
 
+static int count_packed_objects(void)
+{
+	struct packed_git *p;
+	int cnt = 0;
+
+	for (p = packed_git; p; p = p->next)
+		cnt += num_packed_objects(p);
+	return cnt;
+}
+
 int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 {
 	SHA_CTX ctx;
@@ -1631,6 +1645,8 @@ int cmd_pack_objects(int argc, const cha
 	if (non_empty && !nr_result)
 		return 0;
 
+	repacking_superset = count_packed_objects() < nr_result;
+
 	SHA1_Init(&ctx);
 	list = sorted_by_sha;
 	for (i = 0; i < nr_result; i++) {


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]