On 11/29/2021 5:25 PM, Taylor Blau wrote: > Generating a non-expiring cruft packs works as follows: I had trouble parsing the documentation changes below, so I came back to this commit message to see if that helps. > - Callers provide a list of every pack they know about, and indicate > which packs are about to be removed. This corresponds to the list over stdin. > - All packs which are going to be removed (we'll call these the > redundant ones) are marked as kept in-core, as well as any packs > that `pack-objects` found but the caller did not specify. Ok, so as an implementation detail we mark these as keep packs. > These packs are presumed to have entered the repository between > the caller collecting packs and invoking `pack-objects`. Since we > do not want to include objects in these packs (because we don't know > which of their objects are or aren't reachable), these are also > marked as kept in-core. Here, "are presumed" is doing a lot of work. Theoretically, there could be three categories: 1. This pack was just repacked and will be removed because all of its objects were placed into new objects. 2. Either this pack was repacked and contains important reachable objects OR we did a repack of reachable objects and this pack contained some extra, unreachable objects. 3. This pack was added to the repository while creating those repacked packs from category 2, so we don't know if things are reachable or not. So, the packs that we discover on-disk but are not specified over stdin are in this third category, but these are grouped with category 1 as we will treat them the same. > - Then, we enumerate all objects in the repository, and add them to > our packing list if they do not appear in an in-core kept pack. Here, we are looking at all of the objects in category 2 as well as loose objects. > This results in a new cruft pack which contains all known objects that > aren't included in the kept packs. When the kept pack is the result of > `git repack -A`, the resulting pack contains all unreachable objects. This now describes how 'git repack' will interface with this new change to pack-objects. I'll keep an eye out for that. > +--cruft:: Now getting to this description. > + Packs unreachable objects into a separate "cruft" pack, denoted > + by the existence of a `.mtimes` file. Pack names provided over > + stdin indicate which packs will remain after a `git repack`. > + Pack names prefixed with a `-` indicate those which will be > + removed. (...) This description is too tied to 'git repack'. Can we describe the input using terms independent of the 'git repack' operation? I need to keep reading. > (...) The contents of the cruft pack are all objects not > + contained in the surviving packs specified by `--keep-pack`) Now you use --keep-pack, which is a way of specifying a pack as "in-core keep" which was not in your commit message. Here, we also don't link the packs over stdin to the concept of keep packs. > + which have not exceeded the grace period (see > + `--cruft-expiration` below), or which have exceeded the grace > + period, but are reachable from an other object which hasn't. And now we think about the grace period! There is so much going on that I need to break it down to understand. An object is _excluded_ from the new cruft pack if 1. It is reachable from at least one reference. 2. It is in a pack from stdin prefixed with "-" 3. It is in a pack specified by `--keep-pack` 4. It is in an existing cruft pack and the .mtimes file states that its mtime is at least as recent as the time specified by the --cruft-expiration option. Breaking it down into a list like this helps me, at least. I'm not sure what the best way would look like. (Needing to pause here and look at the implementation later.) Thanks, -Stolee