Re: [PATCH v6 17/19] fsck: Introduce `git fsck --quick`

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 21 Jun 2015 13:35:41 -0700

Johannes Schindelin <johannes.schindelin@xxxxxx> writes:

> On 2015-06-21 19:15, Junio C Hamano wrote:
> Michael Haggerty <mhagger@xxxxxxxxxxxx> writes:
>> That's brilliant.
>> 
>> Just to make sure I am reading you correctly, you mean the current
>> overall structure:
>> 
>> [...]
>
> The way I read Michael's mail, he actually meant something different:
> if all of the blob-related errors/warnings are switched to "ignore",
> simply skip unpacking the blobs.

That is how I read his mail, too.

But because IIRC we do not check anything special with blob other
than we can read it correctly, my description of "overall structure"
stayed at a very high conceptual level.  The unpacking may happen at
a much higher level in the code, i.e. it comes way before this part
of the logic flow:

        if ("is bad_blob ignored?")
		;
	else if (! "is the blob loadable and well-formed?") {

in which case "is bad blobs ignored?" check may have to happen
before we unpack the object.

And I do not suggest introducing yet another BAD_BLOB error class; I
would have guessed that you already have an error class for objects
that are not stored correctly (be it truncated loose object, checksum
mismatch in the packed base object, or corrupt delta in pack).

It so happens that blob is the only type of object that does not
have outgoing links that is needed for connectivity check, so even
if you allow to ignore "error class for objects that are not stored
correctly", you would still have to read trees, commits and tags;
it would be a natural consequence of ignoring that class of errors
that you would get a quick-and-dirty fsck by not unpacking blobs.

Of course, that assumes that you can tell an object is a blob
without unpacking.  If a tree entry mentions an object to be a blob
by having 100644 as its mode, unless you unpack the object pointed
at by that tree entry to make sure it is a blob, you wouldn't be
able to detect a case where a non-blob object is stored with 100644
mode, which would be an error in the containing tree object that we
may want to detect.  I am not sure if "skipping inflation of blobs,
but still ensure connectivity and tree integrity" is really a viable
mode of quick-and-dirty operation.  I would imagine you would need
to lose a bit more than "we don't bother reading blobs" (which is OK
by me, but I am just pointing out that (1) I do not mean to say we
should add BAD_BLOB as a new class, and (2) the automatic bypass
Michael's --quick skips may not be limited to suppressing "we cannot
read this blob object" class, but also need to suppress checks for
some form of tree integrity violation).

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe git" in