On Thu, Feb 24 2022, Robert Coup via GitGitGadget wrote: > [...] While a key use case is described > above for partial clones, a user could also use --repair to fix a corrupted > object database by performing a refetch of objects that should already be > present, establishing a better workflow than deleting the local repository > and re-cloning. > > * Using --repair will produce duplicated objects between the existing and > newly fetched packs, but maintenance will clean them up when it runs > automatically post-fetch (if enabled). > * If a user fetches with --repair applying a more restrictive partial clone > filter than previously (eg: blob:limit=1m then blob:limit=1k) the > eventual state is a no-op, since any referenced object already in the > local repository is never removed. More advanced repacking which could > improve this scenario is currently proposed at [2]. I realize this was probably based on feedback on v1 (I didn't go back and re-read it, sorry). But I feel strongly that we really should name this something other than --repair. I don't care much if it isn't that :) But maybe --expand-filters, --fleshen-partial or something like that? So first (and partially as an aside): Is a "noop" negotiatior really want we want at all? Don't we instead want to be discovering those parts of our history that are closed under reachability (if any) and say we HAVE those things during negotiation? I haven't tested, but maybe that's just more complex, e.g. you have a filter that's excluding >500MB blobs or whatever you might have the full history already, or maybe that 500MB blob was added last week, so you have almost all of it. But wouldn't that be a lot kinder to server resources + network at the expense of some (presumably rare) extra local computation? But secondly, on the "--repair" name: The reason I mentioned that is that I'd really like us to actually have a "my repo is screwed, please repair it". But (and I haven't tested, but I'm pretty sure), this patch series isn't going to give you that. The reasons are elaborated on in [1], basically we try really hard to re-use local data, and due to that & the collision detection will often just hard die early in object walking. But maybe I'm wrong, have you actually tested this with *broken* objects as opposed to just missing ones with repo filters + promisors in play? Our t/*fsck* and t/*corrupt*/ etc. tests have some of those. And maybe I'm making a big deal out of nothing, but I fear that by naming it --repair and giving it these semantics that we'd be closing the door on things that are actually needed for some of the trickier edge cases when it comes to repairing a bad repository. Including but not limited to having a loose BAD_OBJ, and needing to replace it with another loose object (due to the unpack limit), the branch we're updating can't be read locally, but is at an OID that's (re-)included in the incoming pack and is hopefully about to repair our repository. Or even that we have a SHA-1 collision, but we intentionally want to override the collision detection because we know our local repo is bad, but the remote can be trusted. All of which are much more involved than just the "fleshen partial data" you're aiming for here... 1. https://lore.kernel.org/git/87czo7haha.fsf@xxxxxxxxxxxxxxxxxxx/