Re: [PATCH 08/17] builtin/pack-objects.c: --cruft without expiration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 06, 2021 at 04:44:31PM -0500, Derrick Stolee wrote:
> On 11/29/2021 5:25 PM, Taylor Blau wrote:
> > Generating a non-expiring cruft packs works as follows:
>
> I had trouble parsing the documentation changes below, so I came back
> to this commit message to see if that helps.
>
> >   - Callers provide a list of every pack they know about, and indicate
> >     which packs are about to be removed.
>
> This corresponds to the list over stdin.
>
> >   - All packs which are going to be removed (we'll call these the
> >     redundant ones) are marked as kept in-core, as well as any packs
> >     that `pack-objects` found but the caller did not specify.
>
> Ok, so as an implementation detail we mark these as keep packs.


> >     These packs are presumed to have entered the repository between
> >     the caller collecting packs and invoking `pack-objects`. Since we
> >     do not want to include objects in these packs (because we don't know
> >     which of their objects are or aren't reachable), these are also
> >     marked as kept in-core.
>
> Here, "are presumed" is doing a lot of work. Theoretically, there could
> be three categories:
>
> 1. This pack was just repacked and will be removed because all of its
>    objects were placed into new objects.
>
> 2. Either this pack was repacked and contains important reachable objects
>    OR we did a repack of reachable objects and this pack contained some
>    extra, unreachable objects.
>
> 3. This pack was added to the repository while creating those repacked
>    packs from category 2, so we don't know if things are reachable or
>    not.
>
> So, the packs that we discover on-disk but are not specified over stdin
> are in this third category, but these are grouped with category 1 as we
> will treat them the same.

Ah, I think I caused some unintentional confusion by attaching "are
presumed" to "these packs", when it wasn't clear that "these packs"
meant "ones that aren't listed over stdin".

Since the caller is supposed to provide a complete picture of the
repository as they see it, any packs known to the pack-objects process
that aren't mentioned over stdin are assumed to have entered the
repository after the caller was spun up.

I'll clarify this section of the commit message, since I agree it is
unnecessarily confusing.

> >   - Then, we enumerate all objects in the repository, and add them to
> >     our packing list if they do not appear in an in-core kept pack.
>
> Here, we are looking at all of the objects in category 2 as well as
> loose objects.

We're enumerating any objects that aren't in packs which are marked as
kept in-core (along with loose objects which don't appear in packs that
are marked as kept in-core).

The in-core kept packs are ones that the caller (and I find it's helpful
to read "the caller" as "git repack") has marked as "will delete". So
the non in-core pack(s) that we're looking at here contain all reachable
objects (e.g., like you would get with `git repack -A`).

> > +	Packs unreachable objects into a separate "cruft" pack, denoted
> > +	by the existence of a `.mtimes` file. Pack names provided over
> > +	stdin indicate which packs will remain after a `git repack`.
> > +	Pack names prefixed with a `-` indicate those which will be
> > +	removed. (...)
>
> This description is too tied to 'git repack'. Can we describe the
> input using terms independent of the 'git repack' operation? I need
> to keep reading.
>
> > (...) The contents of the cruft pack are all objects not
> > +	contained in the surviving packs specified by `--keep-pack`)
>
> Now you use --keep-pack, which is a way of specifying a pack as
> "in-core keep" which was not in your commit message. Here, we also
> don't link the packs over stdin to the concept of keep packs.

The mention of `--keep-pack` is a mistake left over from a previous
version; thanks for spotting. Here's a version of the first paragraph
from this piece of documentation which is less tied to `git repack` and
hopefully a little clearer:

    --cruft::
            Packs unreachable objects into a separate "cruft" pack, denoted
            by the existence of a `.mtimes` file. Typically used by `git
            repack --cruft`. Callers provide a list of pack names and
            indicate which packs will remain in the repository, along with
            which packs will be deleted (indicated by the `-` prefix). The
            contents of the cruft pack are all objects not contained in the
            surviving packs which have not exceeded the grace period (see
            `--cruft-expiration` below), or which have exceeded the grace
            period, but are reachable from an other object which hasn't.

> > +	which have not exceeded the grace period (see
> > +	`--cruft-expiration` below), or which have exceeded the grace
> > +	period, but are reachable from an other object which hasn't.
>
> And now we think about the grace period! There is so much going on
> that I need to break it down to understand.
>
>   An object is _excluded_ from the new cruft pack if
>
>   1. It is reachable from at least one reference.
>   2. It is in a pack from stdin prefixed with "-"
>   3. It is in a pack specified by `--keep-pack`
>   4. It is in an existing cruft pack and the .mtimes file states
>      that its mtime is at least as recent as the time specified by
>      the --cruft-expiration option.
>
> Breaking it down into a list like this helps me, at least. I'm not
> sure what the best way would look like.

Given some expiration T, cruft packs contain all unreachable objects
which are newer than T, along with any cruft objects (i.e., those not
directly reachable from any ref) which are older than T, but reachable
from another cruft object newer than T.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux