On Wed, Aug 08, 2018 at 04:12:10PM -0700, Jonathan Tan wrote: > Many invocations of for_each_object_in_pack() and > for_each_packed_object() (which invokes the former) subsequently check > at least the type of the packed object, necessitating accessing the > packfile itself. For locality reasons, it is thus better to iterate in > pack order, instead of index order. Teach for_each_object_in_pack() to > iterate in pack order by first creating a reverse index. > > This is based off work by Jeff King. > > Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx> > --- > After writing this patch and looking at it further, I'm not sure if this > is a clear benefit, but here's the patch anyway. In particular, > builtin/fsck.c and builtin/cat-file.c just deal with the OID directly > and does not access the packfile at all (at least at the time of > invoking for_each_packed_object). And revision.c, if we are excluding > promisor objects, parses each packed promisor object, but it seems > possible to avoid doing that (replacing the parse_object() by > lookup_unknown_object() still passes tests). Even if you just use the oid to do a separate lookup in the object database, there's still a benefit in accessing the objects in pack order. The case in cat-file needs more than this, though, since it separately sorts the output (it has to, because it has to merge and de-dup the output from several packs plus loose objects). With the patch below on top of yours, I get: $ time git.v2.18.0 cat-file --batch-all-objects --buffer --batch | wc -c 6938365964 real 0m44.686s user 0m42.932s sys 0m5.283s $ time git.compile cat-file --batch-all-objects --buffer --batch | wc -c 8289859070 real 0m7.007s user 0m5.542s sys 0m4.005s But: - it needs to de-duplicate using a hashmap (which is why the output is so much bigger in the second case) - it probably needs to be enabled explicitly by the user, since cat-file is plumbing and callers may be relying on the existing sort order I can try to pick this up and carry the cat-file bits to completion if you want, but probably not until tomorrow or Friday. -Peff