Re: [PATCH v2 0/2] pack-objects: freshen objects with multi-cruft packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 4, 2025 at 1:35 PM Taylor Blau <me@xxxxxxxxxxxx> wrote:
>
> This is a minor reroll of my series to fix a bug in freshening objects
> with multiple cruft packs.
>
> Since last time the only updates to the series are textual changes to
> the commit message, so everything functionally should be working as it
> was before.
>
> As usual, there is a range-diff showing as much below, and the original
> cover letter is as follows:
>
> ---
>
> This short series contains a fix for a bug I noticed while rolling out
> multi-cruft packs (via 'git repack --max-cruft-size') within GitHub's
> infrastructure.
>
> The series is structured as follows:
>
>  - The first patch simplifies how 'repack' aggregates cruft packs
>    together when their size is below the '--max-cruft-size' or
>    '--max-pack-size' threshold. This simplification changes behavior
>    slightly, but not in a meaningful way. It occurred to me while
>    writing the second patch.
>
>  - The second patch describes and fixes the main bug. The gist here is
>    that objects which are (a) unreachable, (b) exist in a cruft pack
>    being retained, and (c) were freshened to have a more recent mtime
>    than any existing cruft copy are unable to be freshened.
>
> The fix pursued in the second patch changes the rules around when we
> want to retain an object via builtin/pack-objects.c::want_found_object()
> when at least one cruft pack will survive the repack.
>
> Previously the rule was to discard any object which appears in any
> surviving pack, regardless of mtime. The rule now is to only discard an
> object if it appears in either (a) a non-cruft pack which will survive
> the repack, or (b) a cruft pack whose mtime for that object is older
> than the one we are trying to pack.
>
> I think that this is the right behavior, but admittedly putting this
> series together hurt my brain trying to think through all of the cases.
> I'm fairly confident in the testing here as I remember it being fairly
> exhaustive of all interesting cases. But I'd appreciate a sanity check
> from others that they too are convinced this is the right approach.
>
> Thanks in advance for your review!
>
> Taylor Blau (2):
>   builtin/repack.c: simplify cruft pack aggregation
>   builtin/pack-objects.c: freshen objects from existing cruft packs
>
>  builtin/pack-objects.c  | 118 ++++++++++++++++++++++++++++++++++------
>  builtin/repack.c        |  38 +------------
>  packfile.c              |   3 +-
>  packfile.h              |   2 +
>  t/t7704-repack-cruft.sh | 105 +++++++++++++++++++++--------------
>  5 files changed, 170 insertions(+), 96 deletions(-)
>
> Range-diff against v1:
> 1:  8564f982597 ! 1:  63ea9d4d00e builtin/repack.c: simplify cruft pack aggregation
>     @@ Commit message
>          would get combined together until the sum of their sizes was no larger
>          than the given max pack size.
>
>     -    There is a much simpler way to achieve this, however, which is to simply
>     -    combine *all* cruft packs which are smaller than the threshold,
>     +    There is a much simpler way to combine cruft packs, however, which is to
>     +    simply combine *all* cruft packs which are smaller than the threshold,
>          regardless of what their sum is. With '--max-pack-size', 'pack-objects'
>          will split out the resulting pack into individual pack(s) if necessary
>          to ensure that the written pack(s) are each no larger than the provided
> 2:  c0c926adde2 ! 2:  7ba9054701b builtin/pack-objects.c: freshen objects from existing cruft packs
>     @@ Commit message
>          only be modified in a pruning GC, or if the threshold itself is
>          adjusted.
>
>     -    However, this process breaks down when we attempt to freshen an object
>     -    packed in an earlier cruft pack that is larger than the threshold and
>     -    thus will survive the repack.
>     +    Prior to this patch, however, this process breaks down when we attempt
>     +    to freshen an object packed in an earlier cruft pack, and that cruft
>     +    pack is larger than the threshold and thus will survive the repack.
>
>          When this is the case, it is impossible to freshen objects in cruft
>     -    pack(s) which are larger than the threshold. This is because we avoid
>     -    writing them in the new cruft pack entirely, for a couple of reasons.
>     +    pack(s) when those cruft packs are larger than the threshold. This is
>     +    because we would avoid writing them in the new cruft pack entirely, for
>     +    a couple of reasons.
>
>           1. When enumerating packed objects via 'add_objects_in_unpacked_packs()'
>              we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping
>     @@ Commit message
>           - exists in a non-cruft pack that we are retaining, regardless of that
>             pack's mtime, or
>
>     -     - exists in a cruft pack with an mtime more recent than the copy we are
>     -       debating whether or not to pack, in which case freshening would be
>     -       redundant.
>     +     - exists in a cruft pack with an mtime at least as recent as the copy
>     +       we are debating whether or not to pack, in which case freshening
>     +       would be redundant.
>
>          To do this, keep track of whether or not we have any cruft packs in our
>          in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
>          flag. When we end up in this new special case, we replace a call to
>     -    'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only
>     -    reject objects when we have a copy in an existing cruft pack with a more
>     -    recent mtime (in which case "freshening" would be redundant).
>     +    'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject
>     +    objects when we have a copy in an existing cruft pack with at least as
>     +    recent an mtime as our candidate (in which case "freshening" would be
>     +    redundant).
>
>          Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
>
>     @@ t/t7704-repack-cruft.sh: test_expect_success '--max-cruft-size with freshened ob
>      +
>      +          git repack --cruft -d &&
>      +
>     -+          # Make a packed copy of object $foo with a more recent
>     -+          # mtime.
>     ++          # Make an identical copy of foo stored in a pack with a more
>     ++          # recent mtime.
>      +          foo="$(generate_random_blob foo $((2*1024*1024)))" &&
>      +          foo_pack="$(echo "$foo" | git pack-objects $packdir/pack)" &&
>      +          test-tool chmtime --get -100 \
>      +                  "$packdir/pack-$foo_pack.pack" >foo.new &&
>      +          git prune-packed &&
>      +
>     -+          # Make a loose copy of object $bar with a more recent
>     -+          # mtime.
>     ++          # Make a loose copy of bar, also with a more recent mtime.
>      +          bar="$(generate_random_blob bar $((2*1024*1024)))" &&
>      +          test-tool chmtime --get -100 \
>      +                  "$objdir/$(test_oid_to_path "$bar")" >bar.new &&
>
> base-commit: 08bdfd453584e489d5a551aecbdcb77584e1b958
> --
> 2.49.0.rc0.57.gdb91954e186.dirty

v2 looks good to me; though I'm curious if some wording improvement in
the commit message might help in distinguishing between
--max-cruft-size and --max-pack-size...and whether we want to provide
any checks on the relative sizes of the two.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux