Re: [PATCH 06/10] rev-list: add --allow-partial option to relax connectivity checks

Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> · Wed, 8 Mar 2017 15:10:54 -0500

On 3/8/2017 1:55 PM, Junio C Hamano wrote:
Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> writes:

From: Jeff Hostetler <git@xxxxxxxxxxxxxxxxx>

Teach rev-list to optionally not complain when there are missing
blobs.  This is for use following a partial clone or fetch when
the server omitted certain blobs.

This makes it impossible to tell from objects missing by design
(because we did an --partial-by-size clone earlier, expecting we can
later fetch from elsewhere when necessary) and objects inaccessible
by accident (because you have a repository corruption), no?

Right.  It will effectively neuter several commands like
index-pack, gc, and fsck WRT missing blobs.

Even though I do very much like the basic "high level" premise to
omit often useless large blobs that are buried deep in the history
we would not necessarily need from the initial cloning and
subsequent fetches, I find it somewhat disturbing that the code
"Assume"s that any missing blob is due to an previous partial clone.
Adding this option smells like telling the users that they are not
supposed to run "git fsck" because a partially cloned repository is
inherently a corrupt repository.

Can't we do a bit better?  If we want to make the world safer again,
what additional complexity is required to allow us to tell the
"missing by design" and "corrupt repository" apart?

I'm open to suggestions here.  It would be nice to extend the
fetch-pack/upload-pack protocol to return a list of the SHAa
(and maybe the sizes) of the omitted blobs, so that a partial
clone or fetch would still be able to be integrity checked.

Jeff