Re: [PATCH v5] builtin/pack-objects.c: freshen objects from existing cruft packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Taylor Blau <me@xxxxxxxxxxxx> writes:

> Once an object is written into a cruft pack, we can only freshen it by
> writing a new loose or packed copy of that object with a more recent
> mtime.
>
> Prior to 61568efa95 (builtin/pack-objects.c: support `--max-pack-size`
> with `--cruft`, 2023-08-28), we typically had at most one cruft pack in
> a repository at any given time. So freshening unreachable objects was
> straightforward when already rewriting the cruft pack (and its *.mtimes
> file).
>
> But 61568efa95 changes things: 'pack-objects' now supports writing
> multiple cruft packs when invoked with `--cruft` and the
> `--max-pack-size` flag. Cruft packs are rewritten until they reach some
> size threshold, at which point they are considered "frozen", and will
> only be modified in a pruning GC, or if the threshold itself is
> adjusted.
>
> Prior to this patch, however, this process breaks down when we attempt
> to freshen an object packed in an earlier cruft pack, and that cruft
> pack is larger than the threshold and thus will survive the repack.
>
> When this is the case, it is impossible to freshen objects in cruft
> pack(s) when those cruft packs are larger than the threshold. This is
> because we would avoid writing them in the new cruft pack entirely, for
> a couple of reasons.
>
>  1. When enumerating packed objects via 'add_objects_in_unpacked_packs()'
>     we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping
>     over the packs we're going to retain (which are marked as kept
>     in-core by 'read_cruft_objects()').
>
>     This means that we will avoid enumerating additional packed copies
>     of objects found in any cruft packs which are larger than the given
>     size threshold. Thus there is no opportunity to call
>     'create_object_entry()' whatsoever.
>
>  2. We likewise will discard the loose copy (if one exists) of any
>     unreachable object packed in a cruft pack that is larger than the
>     threshold. Here our call path is 'add_unreachable_loose_objects()',
>     which uses the 'add_loose_object()' callback.
>
>     That function will eventually land us in 'want_object_in_pack()'
>     (via 'add_cruft_object_entry()'), and we'll discard the object as it
>     appears in one of the packs which we marked as kept in-core.
>
> This means in effect that it is impossible to freshen an unreachable
> object once it appears in a cruft pack larger than the given threshold.
>
> Instead, we should pack an additional copy of an unreachable object we
> want to freshen even if it appears in a cruft pack, provided that the
> cruft copy has an mtime which is before the mtime of the copy we are
> trying to pack/freshen. This is sub-optimal in the sense that it
> requires keeping an additional copy of unreachable objects upon
> freshening, but we don't have a better alternative without the ability
> to make in-place modifications to existing *.mtimes files.
>
> In order to implement this, we have to adjust the behavior of
> 'want_found_object()'. When 'pack-objects' is told that we're *not*
> going to retain any cruft packs (i.e. the set of packs marked as kept
> in-core does not contain a cruft pack), the behavior is unchanged.
>
> But when there *is* at least one cruft pack that we're holding onto, it
> is no longer sufficient to reject a copy of an object found in that
> cruft pack for that reason alone. In this case, we only want to reject a
> candidate object when copies of that object either:
>
>  - exists in a non-cruft pack that we are retaining, regardless of that
>    pack's mtime, or
>
>  - exists in a cruft pack with an mtime at least as recent as the copy
>    we are debating whether or not to pack, in which case freshening
>    would be redundant.
>
> To do this, keep track of whether or not we have any cruft packs in our
> in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
> flag. When we end up in this new special case, we replace a call to
> 'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject
> objects when we have a copy in an existing cruft pack with at least as
> recent an mtime as our candidate (in which case "freshening" would be
> redundant).
>
> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
> ---
>  builtin/pack-objects.c  | 118 ++++++++++++++++++++++++++++++++++------
>  packfile.c              |   3 +-
>  packfile.h              |   2 +
>  t/t7704-repack-cruft.sh |  66 ++++++++++++++++++++++
>  4 files changed, 171 insertions(+), 18 deletions(-)

Thanks.  Will queue.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux