On Fri, Mar 16, 2018 at 04:06:39PM -0400, Jeff King wrote: > > Furthermore, in order to look at an object it has to be zlib inflated > > first, and since commit objects tend to be much smaller than trees and > > especially blobs, there are a lot less bytes to inflate: > > > > $ grep ^commit type-size |cut -d' ' -f2 |avg > > 34395730 / 53754 = 639 > > $ cat type-size |cut -d' ' -f2 |avg > > 3866685744 / 244723 = 15800 > > > > So a simple revision walk inflates less than 1% of the bytes that the > > "enumerate objects packfiles" approach has to inflate. > > I don't think this is quite accurate. It's true that we have to > _consider_ every object, but Git is smart enough not to inflate each one > to find its type. For loose objects we just inflate the header. For > packed objects, we either pick the type directly out of the packfile > header (for a non-delta) or can walk the delta chain (without actually > looking at the data bytes!) until we hit the base. Hmm, so that's a big part of the problem with this patch series. It actually _does_ unpack every object with --stdin-packs to get the type, which is just silly. With the patch below, my time for "commit-graph write --stdin-packs" on linux.git goes from over 5 minutes (I got bored and killed it) to 17 seconds. diff --git a/commit-graph.c b/commit-graph.c index 6348bab82b..cf1da2e8c1 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -491,11 +491,12 @@ static int add_packed_commits(const struct object_id *oid, { struct packed_oid_list *list = (struct packed_oid_list*)data; enum object_type type; - unsigned long size; - void *inner_data; off_t offset = nth_packed_object_offset(pack, pos); - inner_data = unpack_entry(pack, offset, &type, &size); - FREE_AND_NULL(inner_data); + struct object_info oi = OBJECT_INFO_INIT; + + oi.typep = &type; + if (packed_object_info(pack, offset, &oi) < 0) + die("unable to get type of object %s", oid_to_hex(oid)); if (type != OBJ_COMMIT) return 0; -Peff