On Sun, Sep 01, 2024 at 08:41:36PM -0700, Junio C Hamano wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > > > After some profiling, I noticed that we spend a significant amount of > > time in hashwrite(), which is not all that surprising. But much of that > > time is wasted in GitHub's infrastructure, since we are using the same > > collision-detecting SHA-1 implementation to produce a trailing checksum > > for the pack which does not need to be cryptographically secure. > > Cute. > > I wish we can upgrade the file formats so that a writer can choose > the hash algorithm independently from whatever the payload uses. > Most of our use of the tail sum are for files that are consumed > locally in the same repository so nobody shouldn't need to know that > you are using xxHash for the tail sum instead. Yeah. I would actually like to get here in the long-term, but of course that change is much larger than this one (the protocol would have to be adjusted to learn a new "tailsum" capability for callers to negotiate which checksumming hash function they want to use, etc.). > Except that the story is not so simple for packfiles, which is named > after the file's tail sum, so you cannot use a hash algorithm of > your choice independently without affecting other folks. All other > csum-file protected file types are lot smaller than the pack files > to matter, sadly. Right. Thanks, Taylor