Re: [PATCH 2/4] hash.h: scaffolding for _fast hashing variants

Patrick Steinhardt <ps@xxxxxx> · Wed, 4 Sep 2024 09:05:28 +0200

On Tue, Sep 03, 2024 at 01:47:09PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@xxxxxxxxxxxx> writes:
> 
> > I discussed this with brian in the sub-thread where I am talking to
> > them, but I think this is already the case. The pack is read in
> > index-pack and the checksum is verified without using the _fast hash
> > functions, so we would detect:
> >
> >   - either half of a colliding pair of objects, when reading individual
> >     objects' contents to determine their SHA-1s, or
> >
> >   - a colliding pack checksum, when computing the whole pack's checksum
> >     (which also does not use the _fast variants of these functions), and
> >
> >   - a mismatched pack checksum, when verifying the pack's checksum
> >     against the one stored in the pack.
> >
> >>  (2) devise a transition plan to use a hash function that computes a
> >>      value that is different from SHA-1 (or SHA-256 for that
> >>      matter); and
> >>
> >>  (3) pick a hash function that computes a lot faster but is insecure
> >>      and transition to it.
> >
> > So I do not think that either of these two steps are necessary.
> 
> I suspect that it is a wrong conclusion, as I meant (1) to be
> prerequisite for doing (2) and (3), that gives us the real benefit
> of being able to go faster than SHA1DC or even SHA-256.  If (1) is
> unnecessary (because it is already covered), that is great---we can
> directly jump to (2) and (3).

Ah, so the idea would be to not introduce SHA1_fast, but instead use a
hash function that is explicitly designed for fast hashing like xxHash
[1]? When you compare numbers I definitely think that this makes quite
some sense as XXH3 for example hashes at 31.5GB/s whereas SHA1 hashes at
0.8GB/s (if you believe the numbers on their site).

Doing this for data structures structur is almost a no-brainer if you
ask me. For packfiles it's a bit more complicated as we also have to
consider backwards compatibility -- a server of course cannot just start
to send packfiles that use xxHash.

Patrick

[1]: https://github.com/Cyan4973/xxHash