Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes: > OK I think I get what you are trying to say. > ... The attack can be even more simplified; the other side needs to know about only one blob. Suppose you have a corrupt blob B that is not referenced from anything in your repository. "git fsck" will find the corruption of that single blob, but that does not make your repository corrupt, as the corrupt object is irrelevant to your history. The tip of your current healthy history is at commit T. Starting from that state, you fetch from the other side, that has commit X at the tip. In this simplified scenario, X is a direct child of T. You expect that the other side sends everything contained in X that you do not have in T. Now, the only difference X makes relative to T is that it adds a new file whose contents is B at the toplevel of the tree. And the transfer gives you the commit object X itself, and its toplevel tree object, but it omits the blob B by malice (or mistake). Your "rev-list --object T..X" that is run after the transfer completes will not notice that B is corrupt, because it only checks if it exists. And now you corrupted your repository, by making B a part of the history you (incorrectly) declare complete. The whole point of the check after the transfer is to make sure that the transfer will not make a repository that was healthy into a corrupt one, so using --objects and not --verify-objects is a wrong "optimization" in this case. > Not everything is valid, then. Objects from new packs are, existing > ones may be guilty. If there is a way to mark new packs trusted, then > we only need to validate the other objects, which should be the > minority or even empty unless an attack is mounted. Yes, but how? Running "fsck" on all of pre-existing objects every time you fetch (or accept push) is not an answer. If your fetch did not explode the incoming pack into pieces, a possibility is to still use the --verify-object codepath, but pass the identity of the pack (e.g. struct packed_git) down the callchain so that you can avoid rehashing the objects that came from that single pack, but that would not help the case where you ended up calling unpack-objects. I also suspect that more than trivial amount of computation is needed to determine if a given object exists only in a single pack, so the end result may not be that much cheaper than the current --verify-object code. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html