Re: [PATCH v2] Perform cheaper connectivity check when pack is used as medium

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 02 Mar 2012 09:31:56 -0800

Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes:

> OK I think I get what you are trying to say.
> ...

The attack can be even more simplified; the other side needs to know about
only one blob.

Suppose you have a corrupt blob B that is not referenced from anything in
your repository. "git fsck" will find the corruption of that single blob,
but that does not make your repository corrupt, as the corrupt object is
irrelevant to your history. The tip of your current healthy history is at
commit T.

Starting from that state, you fetch from the other side, that has commit X
at the tip. In this simplified scenario, X is a direct child of T.

You expect that the other side sends everything contained in X that you do
not have in T.  Now, the only difference X makes relative to T is that it
adds a new file whose contents is B at the toplevel of the tree.  And the
transfer gives you the commit object X itself, and its toplevel tree
object, but it omits the blob B by malice (or mistake).

Your "rev-list --object T..X" that is run after the transfer completes
will not notice that B is corrupt, because it only checks if it exists.

And now you corrupted your repository, by making B a part of the history
you (incorrectly) declare complete.

The whole point of the check after the transfer is to make sure that the
transfer will not make a repository that was healthy into a corrupt one,
so using --objects and not --verify-objects is a wrong "optimization" in
this case.

> Not everything is valid, then. Objects from new packs are, existing
> ones may be guilty. If there is a way to mark new packs trusted, then
> we only need to validate the other objects, which should be the
> minority or even empty unless an attack is mounted.

Yes, but how?  Running "fsck" on all of pre-existing objects every time
you fetch (or accept push) is not an answer.

If your fetch did not explode the incoming pack into pieces, a possibility
is to still use the --verify-object codepath, but pass the identity of the
pack (e.g. struct packed_git) down the callchain so that you can avoid
rehashing the objects that came from that single pack, but that would not
help the case where you ended up calling unpack-objects.

I also suspect that more than trivial amount of computation is needed to
determine if a given object exists only in a single pack, so the end
result may not be that much cheaper than the current --verify-object code.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html