Sergey Vlasov <vsu@xxxxxxxxxxx> writes: > This is due to optimization in builtin-pack-objects.c:try_delta(): > > /* > * We do not bother to try a delta that we discarded > * on an earlier try, but only when reusing delta data. > */ > if (!no_reuse_delta && trg_entry->in_pack && > trg_entry->in_pack == src_entry->in_pack) > return 0; > > After removing this part the shallow pack after clone is 2.6M, as it > should be. > > The problem with this optimization is that it is only valid if we are > repacking either the same set of objects as we did earlier, or its > superset. But if we are packing a subset of objects, there will be some > objects in that subset which were delta-compressed in the original pack, > but base objects for that deltas are not included in our subset - > therefore we will be unable to reuse existing deltas, and with that > optimization we will never try to use delta compression for these > objects. > ... > So any partial fetch (shallow or not) from a mostly packed repository > currently results in a suboptimal pack. That is correct. How about something like this? I think the determination of "repacking_superset" may need to be tweaked because existing packs may have overlaps, and the patch counts them once per pack. diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 69e5dd3..fb25124 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -64,6 +64,7 @@ struct object_entry { static unsigned char object_list_sha1[20]; static int non_empty; static int no_reuse_delta; +static int repacking_superset; static int local; static int incremental; static int allow_ofs_delta; @@ -1172,10 +1173,13 @@ static int try_delta(struct unpacked *tr return -1; /* - * We do not bother to try a delta that we discarded - * on an earlier try, but only when reusing delta data. + * When we are packing the superset of objects we have already + * packed, we do not bother to try a delta that we discarded + * on an earlier try. This heuristic of course should not + * kick in when we are not reusing delta, or we know we are + * sending a subset of objects from a repository. */ - if (!no_reuse_delta && trg_entry->in_pack && + if (!no_reuse_delta && repacking_superset && trg_entry->in_pack && trg_entry->in_pack == src_entry->in_pack) return 0; @@ -1493,6 +1497,16 @@ static void get_object_list(int ac, cons traverse_commit_list(&revs, show_commit, show_object); } +static int count_packed_objects(void) +{ + struct packed_git *p; + int cnt = 0; + + for (p = packed_git; p; p = p->next) + cnt += num_packed_objects(p); + return cnt; +} + int cmd_pack_objects(int argc, const char **argv, const char *prefix) { SHA_CTX ctx; @@ -1631,6 +1645,8 @@ int cmd_pack_objects(int argc, const cha if (non_empty && !nr_result) return 0; + repacking_superset = count_packed_objects() < nr_result; + SHA1_Init(&ctx); list = sorted_by_sha; for (i = 0; i < nr_result; i++) { - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html