On Mon, Dec 06, 2021 at 04:44:31PM -0500, Derrick Stolee wrote: > On 11/29/2021 5:25 PM, Taylor Blau wrote: > > Generating a non-expiring cruft packs works as follows: > > I had trouble parsing the documentation changes below, so I came back > to this commit message to see if that helps. > > > - Callers provide a list of every pack they know about, and indicate > > which packs are about to be removed. > > This corresponds to the list over stdin. > > > - All packs which are going to be removed (we'll call these the > > redundant ones) are marked as kept in-core, as well as any packs > > that `pack-objects` found but the caller did not specify. > > Ok, so as an implementation detail we mark these as keep packs. > > These packs are presumed to have entered the repository between > > the caller collecting packs and invoking `pack-objects`. Since we > > do not want to include objects in these packs (because we don't know > > which of their objects are or aren't reachable), these are also > > marked as kept in-core. > > Here, "are presumed" is doing a lot of work. Theoretically, there could > be three categories: > > 1. This pack was just repacked and will be removed because all of its > objects were placed into new objects. > > 2. Either this pack was repacked and contains important reachable objects > OR we did a repack of reachable objects and this pack contained some > extra, unreachable objects. > > 3. This pack was added to the repository while creating those repacked > packs from category 2, so we don't know if things are reachable or > not. > > So, the packs that we discover on-disk but are not specified over stdin > are in this third category, but these are grouped with category 1 as we > will treat them the same. Ah, I think I caused some unintentional confusion by attaching "are presumed" to "these packs", when it wasn't clear that "these packs" meant "ones that aren't listed over stdin". Since the caller is supposed to provide a complete picture of the repository as they see it, any packs known to the pack-objects process that aren't mentioned over stdin are assumed to have entered the repository after the caller was spun up. I'll clarify this section of the commit message, since I agree it is unnecessarily confusing. > > - Then, we enumerate all objects in the repository, and add them to > > our packing list if they do not appear in an in-core kept pack. > > Here, we are looking at all of the objects in category 2 as well as > loose objects. We're enumerating any objects that aren't in packs which are marked as kept in-core (along with loose objects which don't appear in packs that are marked as kept in-core). The in-core kept packs are ones that the caller (and I find it's helpful to read "the caller" as "git repack") has marked as "will delete". So the non in-core pack(s) that we're looking at here contain all reachable objects (e.g., like you would get with `git repack -A`). > > + Packs unreachable objects into a separate "cruft" pack, denoted > > + by the existence of a `.mtimes` file. Pack names provided over > > + stdin indicate which packs will remain after a `git repack`. > > + Pack names prefixed with a `-` indicate those which will be > > + removed. (...) > > This description is too tied to 'git repack'. Can we describe the > input using terms independent of the 'git repack' operation? I need > to keep reading. > > > (...) The contents of the cruft pack are all objects not > > + contained in the surviving packs specified by `--keep-pack`) > > Now you use --keep-pack, which is a way of specifying a pack as > "in-core keep" which was not in your commit message. Here, we also > don't link the packs over stdin to the concept of keep packs. The mention of `--keep-pack` is a mistake left over from a previous version; thanks for spotting. Here's a version of the first paragraph from this piece of documentation which is less tied to `git repack` and hopefully a little clearer: --cruft:: Packs unreachable objects into a separate "cruft" pack, denoted by the existence of a `.mtimes` file. Typically used by `git repack --cruft`. Callers provide a list of pack names and indicate which packs will remain in the repository, along with which packs will be deleted (indicated by the `-` prefix). The contents of the cruft pack are all objects not contained in the surviving packs which have not exceeded the grace period (see `--cruft-expiration` below), or which have exceeded the grace period, but are reachable from an other object which hasn't. > > + which have not exceeded the grace period (see > > + `--cruft-expiration` below), or which have exceeded the grace > > + period, but are reachable from an other object which hasn't. > > And now we think about the grace period! There is so much going on > that I need to break it down to understand. > > An object is _excluded_ from the new cruft pack if > > 1. It is reachable from at least one reference. > 2. It is in a pack from stdin prefixed with "-" > 3. It is in a pack specified by `--keep-pack` > 4. It is in an existing cruft pack and the .mtimes file states > that its mtime is at least as recent as the time specified by > the --cruft-expiration option. > > Breaking it down into a list like this helps me, at least. I'm not > sure what the best way would look like. Given some expiration T, cruft packs contain all unreachable objects which are newer than T, along with any cruft objects (i.e., those not directly reachable from any ref) which are older than T, but reachable from another cruft object newer than T. Thanks, Taylor