Re: [PATCH v6 00/14] Serialized Git Commit Graph

Jeff King <peff@xxxxxxxx> · Fri, 16 Mar 2018 16:19:09 -0400

On Fri, Mar 16, 2018 at 04:06:39PM -0400, Jeff King wrote:

> > Furthermore, in order to look at an object it has to be zlib inflated
> > first, and since commit objects tend to be much smaller than trees and
> > especially blobs, there are a lot less bytes to inflate:
> > 
> >   $ grep ^commit type-size |cut -d' ' -f2 |avg
> >   34395730 / 53754 = 639
> >   $ cat type-size |cut -d' ' -f2 |avg
> >   3866685744 / 244723 = 15800
> > 
> > So a simple revision walk inflates less than 1% of the bytes that the
> > "enumerate objects packfiles" approach has to inflate.
> 
> I don't think this is quite accurate. It's true that we have to
> _consider_ every object, but Git is smart enough not to inflate each one
> to find its type. For loose objects we just inflate the header. For
> packed objects, we either pick the type directly out of the packfile
> header (for a non-delta) or can walk the delta chain (without actually
> looking at the data bytes!) until we hit the base.

Hmm, so that's a big part of the problem with this patch series. It
actually _does_ unpack every object with --stdin-packs to get the type,
which is just silly. With the patch below, my time for "commit-graph
write --stdin-packs" on linux.git goes from over 5 minutes (I got bored
and killed it) to 17 seconds.

diff --git a/commit-graph.c b/commit-graph.c
index 6348bab82b..cf1da2e8c1 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -491,11 +491,12 @@ static int add_packed_commits(const struct object_id *oid,
 {
 	struct packed_oid_list *list = (struct packed_oid_list*)data;
 	enum object_type type;
-	unsigned long size;
-	void *inner_data;
 	off_t offset = nth_packed_object_offset(pack, pos);
-	inner_data = unpack_entry(pack, offset, &type, &size);
-	FREE_AND_NULL(inner_data);
+	struct object_info oi = OBJECT_INFO_INIT;
+
+	oi.typep = &type;
+	if (packed_object_info(pack, offset, &oi) < 0)
+		die("unable to get type of object %s", oid_to_hex(oid));
 
 	if (type != OBJ_COMMIT)
 		return 0;

-Peff