On 4/25/2023 5:25 PM, Taylor Blau wrote: > On Tue, Apr 25, 2023 at 03:42:51PM -0400, Derrick Stolee wrote: >> On 4/20/2023 1:27 PM, Taylor Blau wrote: >>> The implementation is straightforward: after determining the set of >>> objects to pack into the cruft pack (either by calling >>> `enumerate_cruft_objects()` or `enumerate_and_traverse_cruft_objects()`) >>> we figure out if there are any program(s) we need to consult in order to >>> determine if there are any objects which we need to "rescue". We then >>> add those as tips to another reachability traversal, marking every >>> object along the way as cruft (and thus retaining it in the cruft pack). >> >> I initially wondered why we would need a second walk, and I _think_ >> one reason is that we don't want to update the mtimes for these >> objects as if they were reachable. The next point about the reachable >> pack is also critical. > > Their mtimes might be updated during that second walk, but the key point > is that we want to rescue any objects that the extra tips might reach > (but aren't listed themselves in the output of one of the hooks). > > The idea there is similar to the rescuing pass we do for pruning GC's, > where we'll save any objects which would have been pruned, but are > reachable from another object which won't yet be pruned. > >>> Instead, keep these unreachable objects in the cruft pack instead, to >>> ensure that we can continue to have a pack containing just reachable >>> objects, which is always safe to write a bitmap over. >> >> Makes sense. Also: don't update their mtime so they could be removed >> immediately if the external pointers change. > > I don't think I'm quite following why we would want to leave their > mtimes alone here. I think we'd want their mtimes to be accurate as of > the last time we updated the *.mtimes file so that if they (a) no longer > appear in the output of one of the extra-cruft hooks, and (b) they have > been written recently, that they would not be pruned immediately. You corrected me that we _are_ updating mtimes, so my understanding was incorrect and this follow-up is incorrect. Thanks. Is it possible to demonstrate this mtime-updating in a test? >>> + if (progress) >>> + progress_state = start_progress(_("Enumerating extra cruft tips"), 0); >> >> Should we be including two progress counts? or would it be sufficient >> to say "traversing extra cruft objects" and include both of these steps >> in that one progress mode? > > I may be missing something, but there are already two progress meters > here: one (above) for enumerating the list of extra cruft tips, which > increments each time we read a new line that refers to an extant object, > and another (below) for the number of objects that we visit along the > second walk. I meant: why two counts? Should we instead only have one? >>> + if (prepare_revision_walk(&revs)) >>> + die(_("revision walk setup failed")); >>> + if (progress) >>> + progress_state = start_progress(_("Traversing extra cruft objects"), 0); >>> + nr_seen = 0; >>> + traverse_commit_list(&revs, show_cruft_commit, show_cruft_object, NULL); >> >> Hm. I'm trying to look at these helpers and figure out what they >> are doing to prevent these objects from being pruned. This could >> use a comment or something, as I'm unable to grok it at present. > > Very fair, the show_cruft_object() callback calls > add_cruft_object_entry(), which adds the visited object to the cruft > pack regardless of its mtime with respect to pruning. Ah, thanks. That helps a lot. > A comment here might look like: > > /* retain all objects reachable from extra tips via show_cruft_object() */ > > or something. Sounds good. >> Could we store this commit to make sure it is removed from the >> repository later? > > Possibly, though I think we already have good coverage of this in other > tests. So I'd be equally happy to assert on the exact contents of the > cruft pack (which wouldn't include the above commit), but I'm happy to > assert on it more directly if you feel strongly. This is a case of me thinking "can we test the test?" to make sure the test is doing everything we think it is doing... >>> + git checkout --orphan old && >>> + git rm -fr . && >>> + test_commit --no-tag cruft.old && >>> + cruft_old="$(git rev-parse HEAD)" && >>> + >>> + git checkout --orphan new && >>> + git rm -fr . && >>> + test_commit --no-tag cruft.new && >>> + cruft_new="$(git rev-parse HEAD)" && >>> + >>> + git checkout main && >>> + git branch -D old new && >> >> Do you need to delete the 'discard' branch here? > > Great catch. I didn't notice it here because even though it was > reachable (and not mentioned as an extra tip), so didn't appear in the > cruft pack. ...so we catch things like this automatically. >>> + mtimes="$(ls .git/objects/pack/pack-*.mtimes)" && >>> + git show-index <${mtimes%.mtimes}.idx >cruft && >>> + cut -d" " -f2 cruft | sort >cruft.actual && >>> + test_cmp cruft.expect cruft.actual >> >> One thing that would be good, as a safety check, is to remove >> the pack.extraCruftTips config and re-run the repack and be >> sure that we are pruning the objects previously saved by the >> hook. > > Good call. I think this amounts to: > > --- 8< --- > diff --git a/t/t5329-pack-objects-cruft.sh b/t/t5329-pack-objects-cruft.sh > index 1260e11d41..44852e6925 100755 > --- a/t/t5329-pack-objects-cruft.sh > +++ b/t/t5329-pack-objects-cruft.sh > @@ -773,7 +773,7 @@ test_expect_success 'additional cruft tips may be specified via pack.extraCruftT > cruft_new="$(git rev-parse HEAD)" && > > git checkout main && > - git branch -D old new && > + git branch -D discard old new && > git reflog expire --all --expire=all && > > # mark cruft.old with an mtime that is many minutes > @@ -802,6 +802,19 @@ test_expect_success 'additional cruft tips may be specified via pack.extraCruftT > mtimes="$(ls .git/objects/pack/pack-*.mtimes)" && > git show-index <${mtimes%.mtimes}.idx >cruft && > cut -d" " -f2 cruft | sort >cruft.actual && > + test_cmp cruft.expect cruft.actual && If we store "discard=$(git rev-parse discard)" earlier, we can also check test_must_fail git show $discard && > + # Ensure that the "old" objects are removed after > + # dropping the pack.extraCruftTips hook. > + git config --unset pack.extraCruftTips && > + git repack --cruft --cruft-expiration=now -d && > + > + mtimes="$(ls .git/objects/pack/pack-*.mtimes)" && > + git show-index <${mtimes%.mtimes}.idx >cruft && > + cut -d" " -f2 cruft | sort >cruft.actual && > + > + git rev-list --objects --no-object-names $cruft_new >cruft.raw && I like how $cruft_new is still kept because its mtime is still "later than now". > + sort cruft.raw >cruft.expect && > test_cmp cruft.expect cruft.actual > ) > ' > --- >8 --- > >> It would be helpful to include a test for the multi-value case where >> multiple scripts are executed and maybe include an "exit 1" in some >> of them? > > Definitely, that is a great suggestion (especially because it would have > caught the "if (!ret)" bug that you pointed out above). > > Thanks for a thorough review. I'll wait for other feedback before > re-rolling, or do so in a couple of days if I haven't heard anything. Thanks, -Stolee