On Tue, Dec 07, 2021 at 10:17:28AM -0500, Derrick Stolee wrote: > On 11/29/2021 5:25 PM, Taylor Blau wrote: > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > > +static int add_cruft_object_entry(const struct object_id *oid, enum object_type type, > > + struct packed_git *pack, off_t offset, > > + const char *name, uint32_t mtime) > > +{ > > + struct object_entry *entry; > > + > > + display_progress(progress_state, ++nr_seen); > > I don't love the global nr_seen here, but it is pervasive through the > file. OK. Yeah; this is how all of the existing progress code works in pack-objects. > > + entry = packlist_find(&to_pack, oid); > > + if (entry) { > > + if (name) { > > + entry->hash = pack_name_hash(name); > > + entry->no_try_delta = name && no_try_delta(name); > > This is already in an "if (name)" block, so "name &&" isn't needed. Thanks; this is a copy-and-paste from add_object_entry(), where we aren't in a conditional on "name". We could also fold the conditional on whether or not name is NULL into no_try_delta itself, since all existing calls look like "name && no_try_delta(name)". So adding something like: if (!name) return 0; to the beginning of no_try_delta()'s implementation would allow us to get rid of the handful of "name &&"s. But I'm trying to avoid touching other parts of pack-objects as much as I can, so I'll hold off for now. > > + } > > + } else { > > + if (!want_object_in_pack(oid, 0, &pack, &offset)) > > + return 0; > > + if (!pack && type == OBJ_BLOB && !has_loose_object(oid)) { > > + /* > > + * If a traversed tree has a missing blob then we want > > + * to avoid adding that missing object to our pack. > > + * > > + * This only applies to missing blobs, not trees, > > + * because the traversal needs to parse sub-trees but > > + * not blobs. > > + * > > + * Note we only perform this check when we couldn't > > + * already find the object in a pack, so we're really > > + * limited to "ensure non-tip blobs which don't exist in > > + * packs do exist via loose objects". Confused? > > + */ > > + return 0; > > + } > > + > > + entry = create_object_entry(oid, type, pack_name_hash(name), > > + 0, name && no_try_delta(name), > > + pack, offset); > > + } > > + > > + if (mtime > oe_cruft_mtime(&to_pack, entry)) > > + oe_set_cruft_mtime(&to_pack, entry, mtime); > > + return 1; > > I was confused at this "return 1" here, while other cases return 0. > > It turns out that there are multiple methods in this file that have > different semantics: add_loose_object() and add_object_entry_from_pack() > are both called from iterators where "return 1" means "stop iterating" > so they return 0 always. add_object_entry_from_bitmap() is used to > iterate over a bitmap and "return 1" means "include this object". > > However, the return code for add_cruft_object_entry() is never used, > so it should probably return void or swap the meanings to have nonzero > mean an error occurred. Yes, exactly. And thanks for tracing out both of the different meanings/interpretations of these add_xyz_entry() functions. As you can imagine, this implementation is copy-and-pasted from add_object_entry(), which was specialized for this use here. At the time, I gave some effort towards trying to share more code with add_object_entry() for this special case, but it ended up being pretty awkward, hence the separate implementation. Ironically, add_object_entry()'s return code is also unused, so we could probably clean that up, too. But like the above, I'll avoid it for now in an effort to touch as little of pack-objects in this patch as I can. > > +static void mark_pack_kept_in_core(struct string_list *packs, unsigned keep) > > +{ > > + struct string_list_item *item = NULL; > > + for_each_string_list_item(item, packs) { > > + struct packed_git *p = item->util; > > + if (!p) > > + die(_("could not find pack '%s'"), item->string); > > Interesting that this is a potential issue. We are expecting the pack > to be loaded before we get here. Is this more because some packs might > not actually load, but it's fine as long as we don't mark them as kept? Not quite "loaded" (though any pack structures that we look at by this point will be fully "loaded"). Instead, we're making sure that all of the packs names we read from stdin could be matched to packs that we found in the repository (i.e., that we produce an appropriate error message if we found "pack-does-not-exist.pack" on stdin). This is all because we process input from stdin in two phases: - First, read all of the input into two string_lists, one for the packs we're about to discard (anything that start with '-'), and another for all of the "fresh" packs (i.e., anything that we're not going to discard). - Then, loop through all of the packed_git structs we have, querying both of the aforementioned string lists for input that matches each pack's `pack_name` field, and setting the `->util` pointer of the matching string_list_entry appropriately. Following those two steps, any list entries that have a NULL util pointer correspond with bogus input, so we want to call die() there. > > + p->pack_keep_in_core = keep; > > + } > > +} > ... > > +static void read_cruft_objects(void) > > +{ > > + struct strbuf buf = STRBUF_INIT; > > + struct string_list discard_packs = STRING_LIST_INIT_DUP; > > + struct string_list fresh_packs = STRING_LIST_INIT_DUP; > > + struct packed_git *p; > > + > > + ignore_packed_keep_in_core = 1; > > Here is a global that we are suddenly changing. Should we not be > returning it to its initial state when this method is complete? We could, although it won't matter in practice, because we'll want to keep that setting around for our traversal, after which point pack-objects will exit. > > +static int option_parse_cruft_expiration(const struct option *opt, > > + const char *arg, int unset) > > +{ > > + if (unset) { > > + cruft = 0; > > This unassignment of 'cruft' when cruft-expiration is unset with > --no-cruft-expiration seems odd. I would expect > > git pack-objects --cruft --no-cruft-expiration > > to still make a cruft pack, but not expire anything. It seems that > your code here makes --no-cruft-expiration disable the --cruft option. Hmm. I could see compelling reasoning that goes both ways. On the one hand, `--no-cruft-expiration` (to me, at least) seems to imply "set `--cruft-expiration` to "never"). On the other hand, it also matches our convention of `--no`-prefixed options to unset some value. This implementation takes the latter approach, though we could easily change it to set the cruft expiration to "never". I don't have a strong opinion about which is better, so I'm happy to do either if you have a better sense about which has more expected behavior. > > + cruft_expiration = 0; > > + } else { > > + cruft = 1; > > + if (arg) > > + cruft_expiration = approxidate(arg); > > + } > > + return 0; > > +} > .. > > + OPT_BOOL(0, "cruft", &cruft, N_("create a cruft pack")), > > + OPT_CALLBACK_F(0, "cruft-expiration", NULL, N_("time"), > > + N_("expire cruft objects older than <time>"), > > + PARSE_OPT_OPTARG, option_parse_cruft_expiration), > > > -static int has_loose_object(const struct object_id *oid) > > +int has_loose_object(const struct object_id *oid) > > { > > return check_and_freshen(oid, 0); > > } > > I'm surprised this hasn't been modified to use a repository pointer. > Adding another caller here isn't too much debt, though. Yeah, check_and_freshen() doesn't have a variant that takes a repository pointer. Good #leftoverbits, I guess! > > +int has_loose_object(const struct object_id *); > > + > > void assert_oid_type(const struct object_id *oid, enum object_type expect); > > ... > > > + test_expect_success "unreachable packed objects are packed (expire $expire)" ' > > + git init repo && > > + test_when_finished "rm -fr repo" && > > + ( > > + cd repo && > > + > > + test_commit packed && > > + git repack -Ad && > > + test_commit other && > > + > > + git rev-list --objects --no-object-names packed.. >objects && > > + keep="$(basename "$(ls $packdir/pack-*.pack)")" && > > + other="$(git pack-objects --delta-base-offset \ > > + $packdir/pack <objects)" && > > + git prune-packed && > > + > > + test-tool chmtime --get -100 "$packdir/pack-$other.pack" >expect && > > I am missing how this test creates _unreachable_ objects. I would expect removal of > some refs or a 'git reset --hard' somewhere. What am I missing? For this and the other tests the so-called "unreachable" objects are technically reachable, but we can treat them as unreachable by putting them in the "discard" packs list (or by not mentioning them at all to `git pack-objects --cruft`). > > + # remove the unreachable tree, but leave the commit > > + # which has it as its root tree in-tact > > nit: "intact" is one word. Thanks; fixed here and in the other test which was added by this commit. Thanks, Taylor