[PATCH v2 0/2] pack-objects: freshen objects with multi-cruft packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a minor reroll of my series to fix a bug in freshening objects
with multiple cruft packs.

Since last time the only updates to the series are textual changes to
the commit message, so everything functionally should be working as it
was before.

As usual, there is a range-diff showing as much below, and the original
cover letter is as follows:

---

This short series contains a fix for a bug I noticed while rolling out
multi-cruft packs (via 'git repack --max-cruft-size') within GitHub's
infrastructure.

The series is structured as follows:

 - The first patch simplifies how 'repack' aggregates cruft packs
   together when their size is below the '--max-cruft-size' or
   '--max-pack-size' threshold. This simplification changes behavior
   slightly, but not in a meaningful way. It occurred to me while
   writing the second patch.

 - The second patch describes and fixes the main bug. The gist here is
   that objects which are (a) unreachable, (b) exist in a cruft pack
   being retained, and (c) were freshened to have a more recent mtime
   than any existing cruft copy are unable to be freshened.

The fix pursued in the second patch changes the rules around when we
want to retain an object via builtin/pack-objects.c::want_found_object()
when at least one cruft pack will survive the repack.

Previously the rule was to discard any object which appears in any
surviving pack, regardless of mtime. The rule now is to only discard an
object if it appears in either (a) a non-cruft pack which will survive
the repack, or (b) a cruft pack whose mtime for that object is older
than the one we are trying to pack.

I think that this is the right behavior, but admittedly putting this
series together hurt my brain trying to think through all of the cases.
I'm fairly confident in the testing here as I remember it being fairly
exhaustive of all interesting cases. But I'd appreciate a sanity check
from others that they too are convinced this is the right approach.

Thanks in advance for your review!

Taylor Blau (2):
  builtin/repack.c: simplify cruft pack aggregation
  builtin/pack-objects.c: freshen objects from existing cruft packs

 builtin/pack-objects.c  | 118 ++++++++++++++++++++++++++++++++++------
 builtin/repack.c        |  38 +------------
 packfile.c              |   3 +-
 packfile.h              |   2 +
 t/t7704-repack-cruft.sh | 105 +++++++++++++++++++++--------------
 5 files changed, 170 insertions(+), 96 deletions(-)

Range-diff against v1:
1:  8564f982597 ! 1:  63ea9d4d00e builtin/repack.c: simplify cruft pack aggregation
    @@ Commit message
         would get combined together until the sum of their sizes was no larger
         than the given max pack size.
     
    -    There is a much simpler way to achieve this, however, which is to simply
    -    combine *all* cruft packs which are smaller than the threshold,
    +    There is a much simpler way to combine cruft packs, however, which is to
    +    simply combine *all* cruft packs which are smaller than the threshold,
         regardless of what their sum is. With '--max-pack-size', 'pack-objects'
         will split out the resulting pack into individual pack(s) if necessary
         to ensure that the written pack(s) are each no larger than the provided
2:  c0c926adde2 ! 2:  7ba9054701b builtin/pack-objects.c: freshen objects from existing cruft packs
    @@ Commit message
         only be modified in a pruning GC, or if the threshold itself is
         adjusted.
     
    -    However, this process breaks down when we attempt to freshen an object
    -    packed in an earlier cruft pack that is larger than the threshold and
    -    thus will survive the repack.
    +    Prior to this patch, however, this process breaks down when we attempt
    +    to freshen an object packed in an earlier cruft pack, and that cruft
    +    pack is larger than the threshold and thus will survive the repack.
     
         When this is the case, it is impossible to freshen objects in cruft
    -    pack(s) which are larger than the threshold. This is because we avoid
    -    writing them in the new cruft pack entirely, for a couple of reasons.
    +    pack(s) when those cruft packs are larger than the threshold. This is
    +    because we would avoid writing them in the new cruft pack entirely, for
    +    a couple of reasons.
     
          1. When enumerating packed objects via 'add_objects_in_unpacked_packs()'
             we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping
    @@ Commit message
          - exists in a non-cruft pack that we are retaining, regardless of that
            pack's mtime, or
     
    -     - exists in a cruft pack with an mtime more recent than the copy we are
    -       debating whether or not to pack, in which case freshening would be
    -       redundant.
    +     - exists in a cruft pack with an mtime at least as recent as the copy
    +       we are debating whether or not to pack, in which case freshening
    +       would be redundant.
     
         To do this, keep track of whether or not we have any cruft packs in our
         in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
         flag. When we end up in this new special case, we replace a call to
    -    'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only
    -    reject objects when we have a copy in an existing cruft pack with a more
    -    recent mtime (in which case "freshening" would be redundant).
    +    'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject
    +    objects when we have a copy in an existing cruft pack with at least as
    +    recent an mtime as our candidate (in which case "freshening" would be
    +    redundant).
     
         Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
     
    @@ t/t7704-repack-cruft.sh: test_expect_success '--max-cruft-size with freshened ob
     +
     +		git repack --cruft -d &&
     +
    -+		# Make a packed copy of object $foo with a more recent
    -+		# mtime.
    ++		# Make an identical copy of foo stored in a pack with a more
    ++		# recent mtime.
     +		foo="$(generate_random_blob foo $((2*1024*1024)))" &&
     +		foo_pack="$(echo "$foo" | git pack-objects $packdir/pack)" &&
     +		test-tool chmtime --get -100 \
     +			"$packdir/pack-$foo_pack.pack" >foo.new &&
     +		git prune-packed &&
     +
    -+		# Make a loose copy of object $bar with a more recent
    -+		# mtime.
    ++		# Make a loose copy of bar, also with a more recent mtime.
     +		bar="$(generate_random_blob bar $((2*1024*1024)))" &&
     +		test-tool chmtime --get -100 \
     +			"$objdir/$(test_oid_to_path "$bar")" >bar.new &&

base-commit: 08bdfd453584e489d5a551aecbdcb77584e1b958
-- 
2.49.0.rc0.57.gdb91954e186.dirty




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux