Hi Derrick, Thanks for a swift and comprehensive review. > On May 5, 2020, at 15:50, Derrick Stolee <stolee@xxxxxxxxx> wrote: > > In the scenario where there is a .keep pack _and_ it is small enough to get > picked up by the batch size, the 'git multi-pack-index repack' command will > create a new pack containing its objects (and objects from other packs) but > the 'git multi-pack-index expire' command will not delete the pack with .keep. > > The good news is that after the first repack, the objects in the pack are > in a newer pack, so the multi-pack-index will not repack those objects from > that pack multiple times. However, this may be unintended behavior for the > user that specified the .keep pack. Yup I experienced exactly this when trying to test midx repack/expire with biggest pack file marked with `.keep`. Luckily the storage size bump for duplicated objects was not noticeable in my case. You worded the situation precisely. > I think the right thing to do to respect "repack.packKeptObjects = false" is > to ignore the packs when selecting the batch of objects. Instead of asking > you to do this, I added a patch below. Please take it into your v2, if you > don't mind. Gladly. This should help me a lot for re-rolling V2. >> +static int delta_base_offset = 1; >> +static int write_bitmaps = -1; >> +static int use_delta_islands; >> + > > Why not make these local to the midx_repack method? No practical reason except me shamelessly lifted those from builtin/repack.c. I was a bit confused how `git repack` houses these logic in the builtin file, while midx was having these logic in the midx.c instead of builtin/multi-pack-index.c. I make them local in V2. >> int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags) >> { >> int result = 0; >> @@ -1381,12 +1385,25 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, >> } else if (fill_included_packs_all(m, include_pack)) >> goto cleanup; >> >> + git_config_get_bool("repack.usedeltabaseoffset", &delta_base_offset); >> + git_config_get_bool("repack.writebitmaps", &write_bitmaps); >> + git_config_get_bool("repack.usedeltaislands", &use_delta_islands); >> + > > It looks like you have some spacing issues here. Perhaps use tabs? Rookie mistake on my part. Will fix it in V2 >> + if (write_bitmaps > 0) >> + argv_array_push(&cmd.args, "--write-bitmap-index"); >> + else if (write_bitmaps < 0) >> + argv_array_push(&cmd.args, "--write-bitmap-index-quiet"); > > These make less sense. Unless --batch-size=0 and there are no .keep > packs (with the patch below) I'm not sure we _can_ write bitmap indexes > here. The pack-file is not necessarily closed under reachability. Or, > will supplying these arguments to 'git pack-objects' actually do that > closure? > > I would be happy to special-case these options to the "--batch-size=0" > situation and otherwise ignore them. This then gets into enough > complication that we should update the documentation as in the patch > below. You make a great point here. I completely missed this as I have been largely testing with repacking only 2 packs, effectively with --batch-size=0. I think having the bitmaps index is highly desirable in `--batch-size=0` case. I will try to include that in V2 (with Documentation). > At minimum, it would be good to have some tests that exercise these > code paths so we know they are behaving correctly. I will do some readings with the current tests for repack and midx. Hopefully I will have something for V2. (^_^ !) > Thanks, > -Stolee Cheers, Son Luong