Re: [PATCH 09/24] repack: implement `--extend-disjoint` mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 28, 2023 at 02:08:18PM -0500, Taylor Blau wrote:
> Now that we can generate packs which are disjoint with respect to the
> set of currently-disjoint packs, implement a mode of `git repack` which
> extends the set of disjoint packs with any new (non-cruft) pack(s)
> generated during the repack.
> 
> The idea is mostly straightforward, with a couple of gotcha's. The
> straightforward part is to make sure that any new packs are disjoint
> with respect to the set of currently disjoint packs which are _not_
> being removed from the repository as a result of the repack.
> 
> If a pack which is currently marked as disjoint is, on the other hand,
> about to be removed from the repository, it is OK (and expected) that
> new pack(s) will contain some or all of its objects. Since the pack
> originally marked as disjoint will be removed, it will necessarily leave
> the disjoint set, making room for new packs with its same objects to
> take its place. In other words, the resulting set of disjoint packs will
> be disjoint with respect to one another.
> 
> The gotchas mostly have to do with making sure that we do not generate a
> disjoint pack in the following scenarios:

Okay, let me verify whether I understand the reasons:

>   - promisor packs

Which is because promisor packs actually don't contain any objects?

>   - cruft packs (which may necessarily need to include an object from a
>     disjoint pack in order to freshen it in certain circumstances)

This one took me a while to figure out. If we'd mark crufts as disjoint,
then it would mean that new packfiles cannot be marked as disjoint if
objects which were previously unreachable do become reachable again.
So we'd be pessimizing packfiles for live objects in favor of others
which aren't.

>   - all-into-one repacks without '-d'

Because here the old packfiles that this would make redundant aren't
deleted and thus the objects are duplicate now.

>   - `--filter-to`, which conceptually could work with the new
>     `--extend-disjoint` option, but only in limited circumstances

We're probably also not properly set up to handle the new alternate
object directory and exclude objects that are part of a potentially
disjoint packfile that exists already. Also, the current MIDX may not
even cover the alternate.

> Otherwise, we mark which packs were created as disjoint by using a new
> bit in the `generated_pack_data` struct, and then marking those pack(s)
> as disjoint accordingly when generating the MIDX. Non-deleted packs
> which are marked as disjoint are retained as such by passing the
> equivalent of `--retain-disjoint` when calling the MIDX API to update
> the MIDX.

Okay. I had a bit of trouble to sift through the various different
flags like "--retain-disjoint", "--extend-disjoint", "--ignore-disjoint"
and so on. But well, they do different things and it's been a few days
since I've reviewed the preceding patches, so this is probably fine.

One thing I wondered: do we need to consider the `-l` flag? When using
an alternate object directory it is totally feasible that the alternate
may be creating new disjoint packages without us knowing, and thus we
may not be able to guarantee the disjoint property anymore.

Patrick

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux