Taylor Blau <me@xxxxxxxxxxxx> writes: > Once an object is written into a cruft pack, we can only freshen it by > writing a new loose or packed copy of that object with a more recent > mtime. > > Prior to 61568efa95 (builtin/pack-objects.c: support `--max-pack-size` > with `--cruft`, 2023-08-28), we typically had at most one cruft pack in > a repository at any given time. So freshening unreachable objects was > straightforward when already rewriting the cruft pack (and its *.mtimes > file). > > But 61568efa95 changes things: 'pack-objects' now supports writing > multiple cruft packs when invoked with `--cruft` and the > `--max-pack-size` flag. Cruft packs are rewritten until they reach some > size threshold, at which point they are considered "frozen", and will > only be modified in a pruning GC, or if the threshold itself is > adjusted. > > Prior to this patch, however, this process breaks down when we attempt > to freshen an object packed in an earlier cruft pack, and that cruft > pack is larger than the threshold and thus will survive the repack. > > When this is the case, it is impossible to freshen objects in cruft > pack(s) when those cruft packs are larger than the threshold. This is > because we would avoid writing them in the new cruft pack entirely, for > a couple of reasons. > > 1. When enumerating packed objects via 'add_objects_in_unpacked_packs()' > we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping > over the packs we're going to retain (which are marked as kept > in-core by 'read_cruft_objects()'). > > This means that we will avoid enumerating additional packed copies > of objects found in any cruft packs which are larger than the given > size threshold. Thus there is no opportunity to call > 'create_object_entry()' whatsoever. > > 2. We likewise will discard the loose copy (if one exists) of any > unreachable object packed in a cruft pack that is larger than the > threshold. Here our call path is 'add_unreachable_loose_objects()', > which uses the 'add_loose_object()' callback. > > That function will eventually land us in 'want_object_in_pack()' > (via 'add_cruft_object_entry()'), and we'll discard the object as it > appears in one of the packs which we marked as kept in-core. > > This means in effect that it is impossible to freshen an unreachable > object once it appears in a cruft pack larger than the given threshold. > > Instead, we should pack an additional copy of an unreachable object we > want to freshen even if it appears in a cruft pack, provided that the > cruft copy has an mtime which is before the mtime of the copy we are > trying to pack/freshen. This is sub-optimal in the sense that it > requires keeping an additional copy of unreachable objects upon > freshening, but we don't have a better alternative without the ability > to make in-place modifications to existing *.mtimes files. > > In order to implement this, we have to adjust the behavior of > 'want_found_object()'. When 'pack-objects' is told that we're *not* > going to retain any cruft packs (i.e. the set of packs marked as kept > in-core does not contain a cruft pack), the behavior is unchanged. > > But when there *is* at least one cruft pack that we're holding onto, it > is no longer sufficient to reject a copy of an object found in that > cruft pack for that reason alone. In this case, we only want to reject a > candidate object when copies of that object either: > > - exists in a non-cruft pack that we are retaining, regardless of that > pack's mtime, or > > - exists in a cruft pack with an mtime at least as recent as the copy > we are debating whether or not to pack, in which case freshening > would be redundant. > > To do this, keep track of whether or not we have any cruft packs in our > in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft' > flag. When we end up in this new special case, we replace a call to > 'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject > objects when we have a copy in an existing cruft pack with at least as > recent an mtime as our candidate (in which case "freshening" would be > redundant). > > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > --- > builtin/pack-objects.c | 118 ++++++++++++++++++++++++++++++++++------ > packfile.c | 3 +- > packfile.h | 2 + > t/t7704-repack-cruft.sh | 66 ++++++++++++++++++++++ > 4 files changed, 171 insertions(+), 18 deletions(-) Thanks. Will queue.