Re: [PATCH v2 0/8] fetch: add repair: full refetch without negotiation (was: "refiltering")

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Mon, 28 Feb 2022 17:43:28 +0100

On Thu, Feb 24 2022, Robert Coup via GitGitGadget wrote:

> [...] While a key use case is described
> above for partial clones, a user could also use --repair to fix a corrupted
> object database by performing a refetch of objects that should already be
> present, establishing a better workflow than deleting the local repository
> and re-cloning.
>
>  * Using --repair will produce duplicated objects between the existing and
>    newly fetched packs, but maintenance will clean them up when it runs
>    automatically post-fetch (if enabled).
>  * If a user fetches with --repair applying a more restrictive partial clone
>    filter than previously (eg: blob:limit=1m then blob:limit=1k) the
>    eventual state is a no-op, since any referenced object already in the
>    local repository is never removed. More advanced repacking which could
>    improve this scenario is currently proposed at [2].

I realize this was probably based on feedback on v1 (I didn't go back
and re-read it, sorry).

But I feel strongly that we really should name this something other than
--repair. I don't care much if it isn't that :) But maybe
--expand-filters, --fleshen-partial or something like that?

So first (and partially as an aside): Is a "noop" negotiatior really
want we want at all? Don't we instead want to be discovering those parts
of our history that are closed under reachability (if any) and say we
HAVE those things during negotiation?

I haven't tested, but maybe that's just more complex, e.g. you have a
filter that's excluding >500MB blobs or whatever you might have the full
history already, or maybe that 500MB blob was added last week, so you
have almost all of it.

But wouldn't that be a lot kinder to server resources + network at the
expense of some (presumably rare) extra local computation?

But secondly, on the "--repair" name: The reason I mentioned that is
that I'd really like us to actually have a "my repo is screwed, please
repair it".

But (and I haven't tested, but I'm pretty sure), this patch series isn't
going to give you that. The reasons are elaborated on in [1], basically
we try really hard to re-use local data, and due to that & the collision
detection will often just hard die early in object walking.

But maybe I'm wrong, have you actually tested this with *broken* objects
as opposed to just missing ones with repo filters + promisors in play?
Our t/*fsck* and t/*corrupt*/ etc. tests have some of those.

And maybe I'm making a big deal out of nothing, but I fear that by
naming it --repair and giving it these semantics that we'd be closing
the door on things that are actually needed for some of the trickier
edge cases when it comes to repairing a bad repository.

Including but not limited to having a loose BAD_OBJ, and needing to
replace it with another loose object (due to the unpack limit), the
branch we're updating can't be read locally, but is at an OID that's
(re-)included in the incoming pack and is hopefully about to repair our
repository.

Or even that we have a SHA-1 collision, but we intentionally want to
override the collision detection because we know our local repo is bad,
but the remote can be trusted.

All of which are much more involved than just the "fleshen partial data"
you're aiming for here...

1. https://lore.kernel.org/git/87czo7haha.fsf@xxxxxxxxxxxxxxxxxxx/