On Wed, Apr 12, 2023 at 02:37:53PM -0400, Taylor Blau wrote: > On Wed, Apr 12, 2023 at 12:22:35PM +0200, Patrick Steinhardt wrote: > > When doing a geometric repack with multi-pack-indices, then we ask > > git-multi-pack-index(1) to use the largest packfile as the preferred > > pack. It can happen though that the largest packfile is not part of the > > main object database, but instead part of an alternate object database. > > The result is that git-multi-pack-index(1) will not be able to find the > > preferred pack and print a warning. It then falls back to use the first > > packfile that the multi-pack-index shall reference. > > > > Fix this bug by only considering packfiles as preferred pack that are > > local. This is the right thing to do given that a multi-pack-index > > should never reference packfiles borrowed from an alternate. > > > > While at it, rename the function `get_largest_active_packfile()` to > > `get_preferred_pack()` to better document its intent. > > > > Signed-off-by: Patrick Steinhardt <ps@xxxxxx> > > > @@ -464,7 +466,16 @@ static struct packed_git *get_largest_active_pack(struct pack_geometry *geometry > > } > > if (geometry->split == geometry->pack_nr) > > return NULL; > > - return geometry->pack[geometry->pack_nr - 1]; > > + > > + for (i = geometry->pack_nr; i > 0; i--) > > + /* > > + * A pack that is not local would never be included in a > > + * multi-pack index. We thus skip over any non-local packs. > > + */ > > + if (geometry->pack[i - 1]->pack_local) > > + return geometry->pack[i - 1]; > > + > > + return NULL; > > } > > Looking good, we want to go through this list in reverse order, since > the packs are ordered largest to smallest. I think that you could > slightly rewrite the loop condition to avoid having to always access > `geometry->pack` at `i-1` instead of `i`, like so: > > --- 8< --- > diff --git a/builtin/repack.c b/builtin/repack.c > index 9d36dc8b84..ba468ac44e 100644 > --- a/builtin/repack.c > +++ b/builtin/repack.c > @@ -467,13 +467,15 @@ static struct packed_git *get_preferred_pack(struct pack_geometry *geometry) > if (geometry->split == geometry->pack_nr) > return NULL; > > - for (i = geometry->pack_nr; i > 0; i--) > + for (i = geometry->pack_nr - 1; i >= 0; i--) { > /* > * A pack that is not local would never be included in a > * multi-pack index. We thus skip over any non-local packs. > */ > - if (geometry->pack[i - 1]->pack_local) > - return geometry->pack[i - 1]; > + struct packed_git *p = geometry->pack[i]; > + if (p->pack_local) > + return p; > + } > > return NULL; > } > --- >8 --- There's a gotcha: `i` is an `unit32_t`, so `i >= 0` would always be true and thus we wrap around and would try to access the pack array at an out-of-bound index. > but I'm not sure that the loop condition is quite right to begin with. > We don't want to iterate all the way down to the beginning of the pack > list, since some of those packs may be below the "split" line, IOW their > contents are going to be rolled up and the packs destroyed. > > So I think the right loop condition would be: > > for (i = geometry->pack_nr - 1; i >= geometry->split; i--) > > which I think makes the "if (geometry->split == geometry->pack_nr)" > condition above redundant with this loop, so you could probably drop > that. > > I would definitely appreciate a second set of eye here. The pack *at* > the split line is considered frozen (IOW, we create a new pack > consisting of the packs in range [0, geometry->split), and leave the > packs in range [geometry->split, geometry->nr) alone). > > So it should be fine to enter that loop when "i == geometry->split", > because it's OK to return that as a potential preferred pack. That makes sense indeed. Will amend. [snip] > > + test $(wc -l <packs) = 1 && > > + > > + # We should also see a multi-pack-index. This multi-pack-index should > > + # never refer to any packfiles in the alternate object database. > > + # Consequentially, it should be valid even with the alternate ODB > > + # deleted. > > + rm -r shared && > > + git -C member multi-pack-index verify > > To test this even more directly, I think that you could ensure that the > set of packs (which should just be the single entry found in "packs" > above) matches what we expect. That should look something like: > > test-tool read-midx member/.git/objects >packs.midx && > grep "^pack-.*\.idx$" packs.midx | sort >actual && > xargs -n 1 basename <packs | sort >expect > test_cmp expect actual > > Note that the read-midx test helper outputs packs with the "*.idx" > suffix, so you'd probably want to alter your find invocation a few lines > above accordingly, or rewrite it before writing into "actual". > > Alternatively, since we expect there to be just a single pack here, I > think we could just as easily write something like: > > test-tool read-midx member/.git/objects >packs.midx && > grep "^pack-.*\.idx$" packs.midx >actual && > test_line_count = 1 actual && > grep $(cat actual) packs > > though I slightly prefer the former since it is less brittle to changing > this test. Either way, though. Thanks, I've adopted that approach with a slight modification. Patrick
Attachment:
signature.asc
Description: PGP signature