On Sun, Sep 01, 2024 at 12:03:24PM -0400, Taylor Blau wrote: > Git's default SHA-1 implementation is collision-detecting, which hardens > us against known SHA-1 attacks against Git objects. This makes Git > object writes safer at the expense of some speed when hashing through > the collision-detecting implementation, which is slower than > non-collision detecting alternatives. > > Prepare for loading a separate "fast" SHA-1 implementation that can be > used for non-cryptographic purposes, like computing the checksum of > files that use the hashwrite() API. > > This commit does not actually introduce any new compile-time knobs to > control which implementation is used as the fast SHA-1 variant, but does > add scaffolding so that the "git_hash_algo" structure has five new > function pointers which are "fast" variants of the five existing > hashing-related function pointers: > > - git_hash_init_fn fast_init_fn > - git_hash_clone_fn fast_clone_fn > - git_hash_update_fn fast_update_fn > - git_hash_final_fn fast_final_fn > - git_hash_final_oid_fn fast_final_oid_fn > > The following commit will introduce compile-time knobs to specify which > SHA-1 implementation is used for non-cryptographic uses. While the property we care about in the context of this patch series indeed is that the second hash is faster, I think the more important property is that it's insecure. If I were seeing two APIs, one labelled fast and one labelled slow, I would of course pick the fast one. So I wonder whether we should rename things accordingly so that developers aren't intrigued to pick the fast one without thinking, and also to have a more useful signal that stands out to reviewers. > Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> > --- > hash.h | 42 ++++++++++++++++++++++++++++++++++++++++++ > object-file.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 84 insertions(+) > > diff --git a/hash.h b/hash.h > index 72ffbc862e5..f255e5c1e8a 100644 > --- a/hash.h > +++ b/hash.h > @@ -44,14 +44,32 @@ > #define platform_SHA1_Final SHA1_Final > #endif > > +#ifndef platform_SHA_CTX_fast > +#define platform_SHA_CTX_fast platform_SHA_CTX > +#define platform_SHA1_Init_fast platform_SHA1_Init > +#define platform_SHA1_Update_fast platform_SHA1_Update > +#define platform_SHA1_Final_fast platform_SHA1_Final > +#ifdef platform_SHA1_Clone > +#define platform_SHA1_Clone_fast platform_SHA1_Clone > +#endif > +#endif We may want to apply our new coding guidelines around nested preprocessor directives, which should also use indenting. > @@ -222,6 +249,21 @@ struct git_hash_algo { > /* The hash finalization function for object IDs. */ > git_hash_final_oid_fn final_oid_fn; > > + /* The fast hash initialization function. */ Providing some context here why there are two sets of functions would help future readers. > @@ -219,6 +256,11 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = { > .update_fn = git_hash_sha256_update, > .final_fn = git_hash_sha256_final, > .final_oid_fn = git_hash_sha256_final_oid, > + .fast_init_fn = git_hash_sha256_init, > + .fast_clone_fn = git_hash_sha256_clone, > + .fast_update_fn = git_hash_sha256_update, > + .fast_final_fn = git_hash_sha256_final, > + .fast_final_oid_fn = git_hash_sha256_final_oid, > .empty_tree = &empty_tree_oid_sha256, > .empty_blob = &empty_blob_oid_sha256, > .null_oid = &null_oid_sha256, I was briefly wondering whether we'd rather want to have automatic fallbacks to the secure alternative when the fast one isn't set. I guess in the end that's not really worth it though, as it saves us one branch when not having such a fallback. And we only have so many hashes, so it's not like this would be a huge pain to maintain. Patrick