Jeff King <peff@xxxxxxxx> writes: >> If a malicious site can craft two packfiles that hash to the same, >> then it can first feed one against a fetch request, then feed the >> other one when a later fetch request comes, and then the later pack >> is discarded by the "existing data wins" rule. Even though this >> later pack may have all the objects necessary to complete the refs >> this later fetch request asked for, and an earlier pack that >> happened to have the same pack trailer hash doesn't have these >> necessary objects, the receiver ends up discarding this necessary >> pack. The end result is that the receiver's repository is now >> corrupt, lacking some objects. > > No, I don't think so. We don't trust the trailer hash for anything to do > with corruption; we actually inflate the objects and see which ones we > got. So the victim will notice immediately that what the attacker sent > it is insufficient to complete the fetch (or push), and will refuse to > update the refs. The fetch transfer, but nobody gets corrupted. In the scenario I was presenting, both the original fetch that gives one packdata and the later fetch that gives another packdata (which happens to share the csum-file trailing checksum) satisfy the "does the new pack give us enough objects to really complete the tips of refs?" check. The second fetch transfers, we validate the packdata using index-pack (we may pass --check-self-contained-and-connected and we would pass --strict if transfer-fsck is set), we perhaps even store it in quarantine area while adding it to the list of in-core packs, make sure everything is now connected from the refs using pre-existing packs and this new pack. The index-pack may see everything is good and then would report the resulting pack name back to index_pack_lockfile() called by fetch-pack.c::get_pack(). That was the scenario I had in mind. Not "bogus sender throws a corrupt pack at you" case, where we check the integrity and connectivity of the objects ourselves. And the trailer hash the sender added at the end of the pack data stream does not have to come into the picture. When we compute that ourselves for the received pack data, because the sender is trying a successful collision attack (they gave us one pack that hashes to certain value earlier; now they are giving us the other one), we would end up computing the same. But even though both of these packs _are_ otherwise valid (in the sense that they satisfactorily transfer objects necessary to make the refs that were fetched complete), because we name the packs after the trailer hash and we cannot have two files with the same name, we end up throwing away the later one. As I said, it is a totally different matter if this attack scenario is a practical threat. For one thing, it is probably harder than just applying the straight "shattered" attack to create a single object collision--you have to make two packs share the same trailing hash _and_ make sure that both of them record data for valid objects. But I am not convinced that it would be much harder (e.g. I understand that zlib deflate can be told not to attempt compression at all, and the crafted garbage used in the middle part of the "shattered" attack can be such a blob object expressed as a base object--once the attacker has two such packfiles that hash the same, two object names for these garbage blobs can be used to present two versions of the values for a ref to be fetched by these two fetch requests).