Hi Glen, Apologies for my delayed response. Now that we are on the other side of 2.41 I think that it's the right time to pick this back up again. On Wed, May 24, 2023 at 04:21:34PM -0700, Glen Choo wrote: > Hi Taylor! It was great seeing you at Review Club today :) It was fun :-). > It would be useful to flesh out why keeping these extra refs are > either "undesirable" or "infeasible". Presumably, you already have > some idea of why this is the case for the GitHub 'audit log'. Yes, thanks for suggesting it. I think the gap in my explanation is "[...] if there are many such objects". Hopefully that clarifies it. > Another potential use case I can think of is client-side third party Git > tools that implement custom workflows on top of a Git repo, e.g. > git-branchless (https://github.com/arxanas/git-branchless) and jj > (https://github.com/martinvonz/jj/, btw I contribute a little to jj > too). I thought that this was an interesting part of the discussion. I hadn't thought of it when writing up these patches, but I think that it could be potentially useful for those tools if they want to keep around some precious set of metadata objects without having to point refs at them. It also introduces a little bit of a higher barrier between the tool and user to destroy those objects. Without pinning them with this hook, all a user has to do to remove them is drop the reference(s) which points at them, and then GC. Now they'd have to modify the hook, etc. > > +gc.recentObjectsHook:: > > I have a small preference to use "command" instead of "hook" to avoid > confusion with Git hooks (I've already observed some of this confusion > in off-list conversations). There's some precedent for "hook" in > `uploadpack.packObjectsHook`, but that's a one-off (and it used to > confuse me a lot too :P). Unless you feel strongly, let's leave it as-is. "gc.recentObjectsHook" is the third iteration of this name, and I'd like to avoid spending much more time on naming if we can help it. > > + When considering the recency of an object (e.g., when generating > > + a cruft pack or storing unreachable objects as loose), use the > > + shell to execute the specified command(s). Interpret their > > + output as object IDs which Git will consider as "recent", > > + regardless of their age. > > >From a potential user's POV, two things seem unclear: > > - What does "recenct" mean in this context? Does it just mean > "precious"? > - What objects do I need to list? Is it every object I want to keep or > just the reachable tips? To answer your questions: recency is referring to the "mtime" of an object [^1], not whether or not it is precious. I clarified this by removing the term "recent" from this sentence altogether, to instead read: "When considering whether or not to remove an object [...]" You only need to list the tips, since Git will treat the output of the hook as input to a traversal which allows for missing objects. Any object visited along that traversal will be kept in the repository and rescued from deletion. I tried to clarify this by adding a final sentence (emphasis mine): "By treating their mtimes as "now", any objects **(and their descendants)** mentioned in the output will be kept regardless of their true age." > In the code changes, I noticed a few out-of-date references to "cruft > tips", but everything else looked reasonable to me. Thanks, I'll clean those up and resubmit it with the above fixes squashed in. Thanks, Taylor [^1]: an object's mtime is the most recent of (1) the st_mtime of a its loose object file, if it exists, (2) the st_mtime of any non-cruft pack(s) that contain that object, or (3) the value listed in the .mtimes file of the cruft pack corresponding to that object's entry.