Hi Victoria, On Fri, May 20, 2022 at 04:36:12PM +0000, Victoria Dye via GitGitGadget wrote: > From: Victoria Dye <vdye@xxxxxxxxxx> > > Update 'repack' to ignore packs named on the command line with the > '--keep-pack' option. Specifically, modify 'init_pack_geometry()' to treat > command line-kept packs the same way it treats packs with an on-disk '.keep' > file (that is, skip the pack and do not include it in the 'geometry' > structure). > > Without this handling, a '--keep-pack' pack would be included in the > 'geometry' structure. If the pack is *before* the geometry split line (with > at least one other pack and/or loose objects present), 'repack' assumes the > pack's contents are "rolled up" into another pack via 'pack-objects'. > However, because the internally-invoked 'pack-objects' properly excludes > '--keep-pack' objects, any new pack it creates will not contain the kept > objects. Finally, 'repack' deletes the '--keep-pack' as "redundant" (since > it assumes 'pack-objects' created a new pack with its contents), resulting > in possible object loss and repository corruption. Nicely found and explained. Having discussed this fix with you already off-list, this approach (to treat kept packs as excluded from the list of packs in the `geometry` structure regardless of whether they are kept on disk or in-core) makes sense to me. I left a couple of small notes on the patch below, but since I have some patches that deal with a separate issue in the `git repack --geometric` code coming, do you want to combine forces (and I can send a lightly-reworked version of this patch as a part of my series)? > @@ -332,17 +332,34 @@ static int geometry_cmp(const void *va, const void *vb) > return 0; > } > > -static void init_pack_geometry(struct pack_geometry **geometry_p) > +static void init_pack_geometry(struct pack_geometry **geometry_p, > + struct string_list *existing_kept_packs) > { > struct packed_git *p; > struct pack_geometry *geometry; > + struct strbuf buf = STRBUF_INIT; > > *geometry_p = xcalloc(1, sizeof(struct pack_geometry)); > geometry = *geometry_p; > > + string_list_sort(existing_kept_packs); Would it be worth sorting this as early as in collect_pack_filenames()? For our purposes in this patch, this works as-is, but it may be defensive to try and minimize the time that list has unsorted contents. > for (p = get_all_packs(the_repository); p; p = p->next) { > - if (!pack_kept_objects && p->pack_keep) > - continue; > + if (!pack_kept_objects) { > + if (p->pack_keep) > + continue; (You mentioned this to me off-list, but I'll repeat it here since it wasn't obvious to me on first read): this check for `p->pack_keep` isn't strictly necessary, since any packs that have their `pack_keep` bit set will appear in the `existing_kept_packs` list. But it does give us a fast path to avoid having to check that list, so it's worth checking that bit to avoid a slightly more expensive check where possible. > + /* > + * The pack may be kept via the --keep-pack option; > + * check 'existing_kept_packs' to determine whether to > + * ignore it. > + */ > + strbuf_reset(&buf); > + strbuf_addstr(&buf, pack_basename(p)); > + strbuf_strip_suffix(&buf, ".pack"); > + > + if (string_list_has_string(existing_kept_packs, buf.buf)) > + continue; It's too bad that we have to do this check at all, and can't rely on the `pack_keep_in_core` in the same way as we check `p->pack_keep`. But lifting that restriction is a more invasive change, so I'm happy to rely on the contents of existing_kept_packs here in the meantime. > + } > > ALLOC_GROW(geometry->pack, > geometry->pack_nr + 1, > @@ -353,6 +370,7 @@ static void init_pack_geometry(struct pack_geometry **geometry_p) > } > > QSORT(geometry->pack, geometry->pack_nr, geometry_cmp); > + strbuf_release(&buf); > } > > static void split_pack_geometry(struct pack_geometry *geometry, int factor) > @@ -714,17 +732,20 @@ int cmd_repack(int argc, const char **argv, const char *prefix) > strbuf_release(&path); > } > > + packdir = mkpathdup("%s/pack", get_object_directory()); > + packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid()); > + packtmp = mkpathdup("%s/%s", packdir, packtmp_name); > + > + collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs, > + &keep_pack_list); > + Makes sense; we have to initialize existing_kept_packs before arranging the list of packs for the `--geometric` split. And presumably `collect_pack_filenames()` relies on `packdir`, `packtmp_name`, and `packtmp` being setup ahead of time, too. > if (geometric_factor) { > if (pack_everything) > die(_("options '%s' and '%s' cannot be used together"), "--geometric", "-A/-a"); > - init_pack_geometry(&geometry); > + init_pack_geometry(&geometry, &existing_kept_packs); > split_pack_geometry(geometry, geometric_factor); > } > > - packdir = mkpathdup("%s/pack", get_object_directory()); > - packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid()); > - packtmp = mkpathdup("%s/%s", packdir, packtmp_name); > - > sigchain_push_common(remove_pack_on_signal); > > prepare_pack_objects(&cmd, &po_args); > @@ -764,9 +785,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix) > if (use_delta_islands) > strvec_push(&cmd.args, "--delta-islands"); > > - collect_pack_filenames(&existing_nonkept_packs, &existing_kept_packs, > - &keep_pack_list); > - > if (pack_everything & ALL_INTO_ONE) { > repack_promisor_objects(&po_args, &names); > > diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh > index bdbbcbf1eca..f5ac23413d5 100755 > --- a/t/t7703-repack-geometric.sh > +++ b/t/t7703-repack-geometric.sh > @@ -180,6 +180,40 @@ test_expect_success '--geometric ignores kept packs' ' > ) > ' > > +test_expect_success '--geometric ignores --keep-pack packs' ' > + git init geometric && > + test_when_finished "rm -fr geometric" && > + ( > + cd geometric && > + > + # Create two equal-sized packs > + test_commit kept && # 3 objects > + test_commit pack && # 3 objects > + > + KEPT=$(git pack-objects --revs $objdir/pack/pack <<-EOF > + refs/tags/kept > + EOF > + ) && > + PACK=$(git pack-objects --revs $objdir/pack/pack <<-EOF > + refs/tags/pack > + ^refs/tags/kept > + EOF > + ) && Nit; we don't care about the name of $PACK, so it would probably be fine to avoid storing the `PACK` variable. We could write these packs with just `git repack -d` after each `test_commit` (which would avoid us having to call `prune-packed`). Does it matter which one is kept? I don't think so, since AFAICT the critical bit is that we mark one of the packs being rolled up as a `--keep-pack`. > + # Prune loose objects that are now packed into PACK and KEEP > + git prune-packed && > + > + git repack --geometric 2 -dm --keep-pack=pack-$KEPT.pack >out && > + > + # Packs should not have changed (only one non-kept pack, no > + # loose objects), but midx should now exist. > + test_i18ngrep "Nothing new to pack" out && Nit; test_i18ngrep here should just be "grep". > + test_path_is_file $midx && > + test_path_is_file $objdir/pack/pack-$KEPT.pack && > + git fsck > + ) Thanks, Taylor