On Tue, May 02, 2023 at 08:09:47PM -0400, Taylor Blau wrote: > However, there is no option to be able to keep around certain objects > that have otherwise aged out of the grace period. The only way to retain > those objects is: > > - to point a reference at them, which may be undesirable or > infeasible, > - to track them via the reflog, which may be undesirable since the > reflog's lifetime is limited to that of the reference it's tracking > (and callers may want to keep those unreachable objects around for > longer) > - to extend the grace period, which may keep around other objects that > the caller *does* want to discard, > - or, to force the caller to construct the pack of objects they want > to keep themselves, and then mark the pack as kept by adding a > ".keep" file. OK, I understand the use case you're trying to support, and your approach mostly makes sense. But there are two things I was surprised by in the implementation: 1. Does this need to be tied to cruft packs? The same logic would apply to "repack -A" which turns objects loose (of course you probably don't want to do that in the long term for performance reasons, but it's conceptually the same thing; see below). 2. Why is there a separate walk distinct from the one that rescues recent-but-unreachable objects? Conceptually it seems to me that the simplest and most flexible way to think of this new feature is: pretend these objects are recent enough to be kept in the grace period, even though their mtimes do not qualify". And then everything else would just fall out naturally. Am I missing something? In a pre-cruft-pack world, I'd have just expected the patch to look like this: diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index bd6ad016d6..1d655dc758 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4077,6 +4077,7 @@ static void get_object_list(struct rev_info *revs, int ac, const char **av) if (add_unseen_recent_objects_to_traversal(revs, unpack_unreachable_expiration, NULL, 0)) die(_("unable to add recent objects")); + add_tips_from_program(revs); if (prepare_revision_walk(revs)) die(_("revision walk setup failed")); traverse_commit_list(revs, record_recent_commit, Sadly the cruft-pack feature doesn't really share this code, but I think it does something similar, and could just consider those synthetic tips as "recent" for this run. > +static int enumerate_extra_cruft_tips_1(struct rev_info *revs, const char *args) > +{ > + struct child_process cmd = CHILD_PROCESS_INIT; > + struct strbuf buf = STRBUF_INIT; > + FILE *out = NULL; > + int ret = 0; > + > + cmd.use_shell = 1; > + cmd.out = -1; > + > + strvec_push(&cmd.args, args); > + strvec_pushv(&cmd.env, (const char **)local_repo_env); Why does this clear the environment of local-repo variables? That seems unlike most other hooks we have, and would cause confusion if $GIT_DIR or various config variables are important (e.g., should "git -c foo.bar gc" persist foo.bar when running the hook? It usually does). I know that some hooks that try to change repositories by changing directories have the opposite confusion ($GIT_DIR is set, but they did not want it). But it makes more sense to me to remain consistent with how other hooks behave. > + if (start_command(&cmd)) { > + ret = -1; > + goto done; > + } This may be a matter of taste, but you can "return -1" directly here, as nothing has been allocated (and a failed start_command() will call child_process_clear() for you). This would mean "out" is always set in the "done:" label, so it wouldn't need a NULL-check (nor an initialization). > + > + out = xfdopen(cmd.out, "r"); > + while (strbuf_getline_lf(&buf, out) != EOF) { is there any reason to be a stickler for LF versus CRLF here? I.e., wouldn't strbuf_getline() be more friendly, with little chance that we misinterpret the result? > + struct object_id oid; > + struct object *obj; > + const char *rest; > + > + if (parse_oid_hex(buf.buf, &oid, &rest) || *rest) { > + ret = error(_("invalid extra cruft tip: '%s'"), buf.buf); > + goto done; > + } > + > + obj = parse_object(the_repository, &oid); > + if (!obj) > + continue; This parse_object() can be pretty expensive, especially if you are rescuing a lot of objects (or if your program directly references large blobs). Can we use oid_object_info() here in combination with lookup_object_by_type()? -Peff