On Tue, Sep 03, 2024 at 01:47:09PM -0700, Junio C Hamano wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > > > I discussed this with brian in the sub-thread where I am talking to > > them, but I think this is already the case. The pack is read in > > index-pack and the checksum is verified without using the _fast hash > > functions, so we would detect: > > > > - either half of a colliding pair of objects, when reading individual > > objects' contents to determine their SHA-1s, or > > > > - a colliding pack checksum, when computing the whole pack's checksum > > (which also does not use the _fast variants of these functions), and > > > > - a mismatched pack checksum, when verifying the pack's checksum > > against the one stored in the pack. > > > >> (2) devise a transition plan to use a hash function that computes a > >> value that is different from SHA-1 (or SHA-256 for that > >> matter); and > >> > >> (3) pick a hash function that computes a lot faster but is insecure > >> and transition to it. > > > > So I do not think that either of these two steps are necessary. > > I suspect that it is a wrong conclusion, as I meant (1) to be > prerequisite for doing (2) and (3), that gives us the real benefit > of being able to go faster than SHA1DC or even SHA-256. If (1) is > unnecessary (because it is already covered), that is great---we can > directly jump to (2) and (3). Ah, so the idea would be to not introduce SHA1_fast, but instead use a hash function that is explicitly designed for fast hashing like xxHash [1]? When you compare numbers I definitely think that this makes quite some sense as XXH3 for example hashes at 31.5GB/s whereas SHA1 hashes at 0.8GB/s (if you believe the numbers on their site). Doing this for data structures structur is almost a no-brainer if you ask me. For packfiles it's a bit more complicated as we also have to consider backwards compatibility -- a server of course cannot just start to send packfiles that use xxHash. Patrick [1]: https://github.com/Cyan4973/xxHash