On Thu, Sep 01, 2011 at 03:42:46PM -0700, Junio C Hamano wrote: > Junio C Hamano <gitster@xxxxxxxxx> writes: > > > I think the breakages are: > > > > - The sending side does not give any indication that it _wanted_ to send > > ce0136 but couldn't, and ended up sending another object; This isn't fixable without the server re-hashing every outgoing object, though, is it? It thinks it has ce0136, but the data is corrupted. The only way to detect that is to rehash each object that we send. That's pretty expensive when you consider decompression and delta reconstruction. Besides which, it doesn't help many malicious cases at all. It will handle disk corruption or a malicious object-db tweak. But it does nothing against an attacker who can trojan git, or who can spoof the network session (which is probably hard to do if its straight ssh, but keep in mind that traffic may be going through cleartext proxies). So it seems that the receiving side is the only sensible place to put such checks. It catches problems at more levels, and it can generally afford the extra processing load (under the assumption that pushes to servers are smaller and less frequent than fetches). > > - The pack data sent over the wire was self consistent (no breakage here) > > and sent three well-formed objects, but it was inconsistent with > > respect to what history was being transferred (breakage is here); I don't think this matters. We have thin packs, after all. What goes on the wire doesn't have to be sensible unto itself; it's about whether the receiving end can do something useful with it. > > - The receiving end did not notice the inconsistency. > > > > The first one is of the lower priority, as the client side should be able > > to notice an upstream with corruption in any case. Perhaps after asking > > for objects between "have" and "want", "git fetch" should verify that it > > can fully walk the subhistory that was supposed to be transferred down to > > the blob level? That seems like a sensible improvement to me. > So I have a series to fix the latter "more important" half I'll be sending > out in this thread. If I understand correctly, your series is just about checking that we have newly-referenced blobs. We were already checking commits and trees, and we should already be hashing individual objects when we index the pack. Right? In that case, the performance change should be negligible. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html