Re: [PATCH 03/10] builtin/pack-objects.c: learn '--assume-kept-packs-closed'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 29, 2021 at 03:03:08PM -0800, Junio C Hamano wrote:

> Taylor Blau <me@xxxxxxxxxxxx> writes:
> 
> > So, I think that teaching pack-objects a way to understand a caller that
> > says "include objects from packs X, Y, and Z, but not if they appear in
> > packs A, B, or C, and also pull in any loose objects" is the best way
> > forward here.
> 
> Are our goals still include that the resulting packfile has good
> delta compression and object locality?  Reachability traversal
> discovers which commit comes close to which other commits to help
> pack-objects to arrange the resulting pack so that objects that
> appear close together in history appears close together.  It also
> gives each object a pathname hint to help group objects of the same
> type (either blobs or trees) with like-paths together for better
> deltification.
> 
> Without reachability traversal, I would imagine that it would become
> quite important to keep the order in which objects appear in the
> original pack, and existing delta chain, as much as possible, or
> we'd be seeing a horribly inefficient pack like fast-import would
> produce.

Thanks, that's another good point we discussed a while ago (off-list),
but hasn't come up in this discussion yet.

Another option here is not to roll up packs at all, but instead to use a
midx to cover them all[1]. That solves the issue where object lookup is
O(nr_packs), and you retain the same locality and delta characteristics.

But I think part of the goal is to actually improve the deltas, in two
ways:

  - we'd hopefully find new delta opportunities between objects in the
    various packs

  - we'll drop some objects that are duplicated in other packs.
    Definitely we have to to avoid duplicates in the roll-up pack, but I
    think we'd want to even for objects that are in the "big" kept pack.
    These are likely bases of deltas in our roll-up pack, since the
    common cause there is --fix-thin adding them to complete the pack.
    But we really prefer to serve fetches using the ones out of the main
    pack, since they may already themselves be deltas (which makes them
    way cheaper; we can send the delta straight off the disk, rather
    than looking for a new possible base).

So I would anticipate the delta-compression phase actually trying to do
some new work. I do worry that the lack of pathname hints may make the
deltas we find much more worse (or cause us to spend excessive CPU
searching for them). It's possible we could do a "best effort" traversal
where we walk new commits to find newly added pathnames, but don't
bother crossing into trees/commits that aren't in the set of objects to
be packed. It's OK to optimize for speed there, because it's just
feeding the delta heuristic, not the set of objects we'd plan to pack.

-Peff

[1] Our end-game plan is actually to _also_ use a midx to cover the
    roll-ups and the "big" pack, since we'd want to generate bitmaps for
    the new objects, too.'



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux