Re: [PATCH v5 0/8] Repack objects into separate packfiles based on a filter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 15, 2023 at 03:32:23PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@xxxxxxxxxxxx> writes:
>
> > I think the reason that this succeeds is that we already have a bitmap,
> > and it likely reuses all of the existing bitmaps before discovering that
> > the pack we wrote doesn't contain all objects.
>
> Now I am confused.
>
> We were asked to write bitmap index when we are going to create an
> incomplete pack, and the packfile we generate with the filter will
> not have full set of objects, and generating a bitmap with such an
> incomplete knowledge of what objects are reachable from what would
> be a disaster, so we should turn it off.  But the posted patch
> lacked such a "we should abort when bitmap is asked to be written
> while filtering" logic.

I was similarly confused, and started writing a patch to detect when we
see objects in one bitmap but not the other when remapping. But we
already handle that case, see the call to `rebuild_bitmap()` from
`fill_bitmap_commit()` in pack-bitmap-write.c.

So I don't think we'd ever end up reusing an existing bitmap that refers
to objects that we don't have.

But something is definitely strange here. The bitmap generated by this
test claims to have three commits:

    $ ~/src/git/t/helper/test-tool bitmap list-commits
    95a9e53327b06212dcf98bd44794b0e2b913deab
    3677360288c631b6b2e1f0e1f081b1e518605e9f
    6f105e6234717c52e9b117b08840926910a68314

...but none of them actually appear to exist in the bitmap:

    $ git rev-list --test-bitmap 95a9e53327b06212dcf98bd44794b0e2b913deab
    Bitmap v1 test (3 entries loaded)
    Found bitmap for '95a9e53327b06212dcf98bd44794b0e2b913deab'. 64 bits / 8b3b6ee7 checksum
    fatal: object not in bitmap: 'ac3e272b72bbf89def8657766b855d0656630ed4'

I think what's going on here is that we attempt to create bitmaps for
all three of those commits. We then try and reuse the existing bitmaps,
but fail, because we are missing some objects.

So then we try and generate the bitmap from scratch, and when we get
down to fill_bitmap_tree() we look up the bit position of the tree
itself, and find a non-zero answer, indicating that we have already
marked that tree.

And fill_bitmap_tree() correctly assumes that if we have marked the bit
corresponding to the tree, that everything reachable from that tree has
also been marked. So we never try and locate the bit position for the
blob, since we already think that we have a blob marked in the resulting
bitmap!

But why is that tree marked in the first place? It's because we attempt
to rebuild the bitmap from the existing .bitmap file, but fail part of
the way through (when we look up the first blob object in the reposition
table). But that happens *after* we see the tree object, so its bit
position is marked, even though we didn't rebuild a complete bitmap.

I don't think this matters outside of filtered repacks, but it would be
a serious bug to not catch this earlier up like suggested in the
(quoted) patch below.

> > but I wonder if a more complete fix would be something like:
> > ...
> > @@ -966,6 +972,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
> >  	if (write_bitmaps && !(pack_everything & ALL_INTO_ONE) && !write_midx)
> >  		die(_(incremental_bitmap_conflict_error));
> >
> > +	if (write_bitmaps && po_args.filter_options.choice)
> > +		die(_(filtered_bitmap_conflict_error));
> > +
>
> It sounds like the most direct fix.

I agree.

I think that we would be OK to not change the implementation of
rebuild_bitmap(), or its caller in fill_bitmap_commit(), since this only
bites us when bitmapping a filtered pack, and we should catch that case
well before getting this deep into the bitmap code.

But it does seem suspect that we rebuild right into ent->bitmap, so we
may want to consider doing something like:

--- >8 ---
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index f6757c3cbf..f4ecdf8b0e 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -413,15 +413,19 @@ static int fill_bitmap_commit(struct bb_commit *ent,

 		if (old_bitmap && mapping) {
 			struct ewah_bitmap *old = bitmap_for_commit(old_bitmap, c);
+			struct bitmap *remapped = bitmap_new();
 			/*
 			 * If this commit has an old bitmap, then translate that
 			 * bitmap and add its bits to this one. No need to walk
 			 * parents or the tree for this commit.
 			 */
-			if (old && !rebuild_bitmap(mapping, old, ent->bitmap)) {
+			if (old && !rebuild_bitmap(mapping, old, remapped)) {
+				bitmap_or(ent->bitmap, remapped);
+				bitmap_free(remapped);
 				reused_bitmaps_nr++;
 				continue;
 			}
+			bitmap_free(remapped);
 		}

 		/*
--- 8< ---

on top.

Applying that patch and then rerunning the tests with the appropriate
TEST variables causes the 'git repack' to fail as expected, ensuring
that the containing test passes.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux