Re: [RFC PATCH] packfile: iterate packed objects in pack order

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 08, 2018 at 04:12:10PM -0700, Jonathan Tan wrote:

> Many invocations of for_each_object_in_pack() and
> for_each_packed_object() (which invokes the former) subsequently check
> at least the type of the packed object, necessitating accessing the
> packfile itself. For locality reasons, it is thus better to iterate in
> pack order, instead of index order. Teach for_each_object_in_pack() to
> iterate in pack order by first creating a reverse index.
> 
> This is based off work by Jeff King.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
> ---
> After writing this patch and looking at it further, I'm not sure if this
> is a clear benefit, but here's the patch anyway. In particular,
> builtin/fsck.c and builtin/cat-file.c just deal with the OID directly
> and does not access the packfile at all (at least at the time of
> invoking for_each_packed_object). And revision.c, if we are excluding
> promisor objects, parses each packed promisor object, but it seems
> possible to avoid doing that (replacing the parse_object() by
> lookup_unknown_object() still passes tests).

Even if you just use the oid to do a separate lookup in the object
database, there's still a benefit in accessing the objects in pack
order. The case in cat-file needs more than this, though, since it
separately sorts the output (it has to, because it has to merge and
de-dup the output from several packs plus loose objects).

With the patch below on top of yours, I get:

  $ time git.v2.18.0 cat-file --batch-all-objects --buffer --batch | wc -c
  6938365964

  real	0m44.686s
  user	0m42.932s
  sys	0m5.283s

  $ time git.compile cat-file --batch-all-objects --buffer --batch | wc -c
  8289859070

  real	0m7.007s
  user	0m5.542s
  sys	0m4.005s

But:

  - it needs to de-duplicate using a hashmap (which is why the output is
    so much bigger in the second case)

  - it probably needs to be enabled explicitly by the user, since
    cat-file is plumbing and callers may be relying on the existing sort
    order

I can try to pick this up and carry the cat-file bits to completion if
you want, but probably not until tomorrow or Friday.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux