Re: [PATCH] Implement fast hash-collision detection

Bill Zaumen <bill.zaumen@xxxxxxxxx> · Tue, 29 Nov 2011 20:01:45 -0800

Note: for some reason my email is not showing up on the mailing list.
I'm trying a different email address - previously my 'From' field
contained a subaddress "+git" but gmail won't put that in the 'Sender'
field, so possibly the email is being filtered for that reason.

On Tue, 2011-11-29 at 09:08 -0800, Shawn Pearce wrote:

> I don't think you understand how these thin packs are processed.

I think the confusion was due to me being a bit too terse.  The
documentation clearly states that thin packs allow deltas to be
sent when the delta is based on an object that the server and client
both have in common, given the commits each already has.  If there is
one server and one client, there isn't an issue.  The case I meant is
the one in which a user does a fetch from one server, gets a forged
blob, and then fetches from another server with the original blob, and
with additional commits along the same branch. If a server bases the
delta off of the original blob, and the client applies the delta to the
forged blob, the client will most likely end up with a blob with a
different SHA-1 hash than the one expected.  Since an object in a tree
is then missing (no object with the expected SHA-1 hash), the repository
is corrupted.

The "first to arrive wins" policy isn't sufficient in one specific case:
multiple remote repositories where new commits are added asynchronously,
with the repositories out of sync possibly for days at a time (e.g.,
over a 3-day weekend).  In this case, the first to arrive at one
repository may not be the first to arrive at another, so what happens at
a particular client in the presence of hash collisions is dependent on
the sequence of remotes from which updates were fetched.  The risk
occurs in the window where the repositories are out of sync.

Regarding the kernel.org problem that you used as a separate example,
while it was fortunately possible to rebuild things (and git provided
significant advantages), earlier detection of the problem might have
reduced the time for which kernel.org was down.  Early detection of
errors in general is a good practice if it can be done at a reasonable
cost.

> Trust. Review. Verify.

While good advice in principle, you should keep in mind that there are
a lot of people out there working at various companies who are not as
capable as you are.  Some of them are overworked and make mistakes
because they've been working 16 hour days for weeks trying to meet a
deadline. Given that, extra checks to catch problems early
are probably a good idea if they don't impact performance significantly.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html