On 2024-11-21 at 09:37:31, Jeff King wrote: > On Wed, Nov 20, 2024 at 02:13:50PM -0500, Taylor Blau wrote: > > > +static const struct git_hash_algo sha1_unsafe_algo = { > > + .name = "sha1", > > + .format_id = GIT_SHA1_FORMAT_ID, > > + .rawsz = GIT_SHA1_RAWSZ, > > + .hexsz = GIT_SHA1_HEXSZ, > > + .blksz = GIT_SHA1_BLKSZ, > > + .init_fn = git_hash_sha1_init_unsafe, > > + .clone_fn = git_hash_sha1_clone_unsafe, > > + .update_fn = git_hash_sha1_update_unsafe, > > + .final_fn = git_hash_sha1_final_unsafe, > > + .final_oid_fn = git_hash_sha1_final_oid_unsafe, > > + .empty_tree = &empty_tree_oid, > > + .empty_blob = &empty_blob_oid, > > + .null_oid = &null_oid_sha1, > > +}; > > All of the non-function fields here naturally must match the ones in the > parent algo struct, or chaos ensues. That's a little fragile, but it's > not like we're adding new algorithm variants a lot. The biggest risk, I > guess, would be adding a new field to git_hash_algo which defaults to > zero-initialization here. But again, there are only three total and they > are defined near each other here, so I don't think it's a big risk > overall. > > I think this struct is a potential landmine for hash_algo_by_ptr(): > > static inline int hash_algo_by_ptr(const struct git_hash_algo *p) > { > return p - hash_algos; > } > > It's undefined behavior to pass in sha1_unsafe_algo to this function > (but the compiler would not complain since the types are the same). I > don't find it incredibly likely that somebody would want to do that on > an unsafe variant, but I'm not thrilled about leaving that wart for > somebody to find. > > If we don't care about the speed of this function, then an > implementation like: > > for (i = 0; i < GIT_HASH_NALGOS; i++) { > if (p == &hash_algos[i] || p == hash_algos[i]->unsafe) > return i; > } > return GIT_HASH_UNKNOWN; > > would work. I'm not sure if that would be measurable. I was surprised at > the number of places that hash_algo_by_ptr() is called. Many low-level > oid functions need it because we store the integer id there rather than > a direct pointer (so oidread(), oidclr(), oid_object_info_extended(), > and so on). But I'd also expect the loop above to be pretty fast. So I > dunno. Yeah, I'm also a little nervous about this change with hash_algo_by_ptr. I think we had discussed in the other thread about doing something like this: struct git_hash_algo_fns { /* The hash initialization function. */ git_hash_init_fn init_fn; /* The hash context cloning function. */ git_hash_clone_fn clone_fn; /* The hash update function. */ git_hash_update_fn update_fn; /* The hash finalization function. */ git_hash_final_fn final_fn; /* The hash finalization function for object IDs. */ git_hash_final_oid_fn final_oid_fn; }; and then doing this: struct git_hash_algo { [...] struct git_hash_algo_fns fns; struct git_hash_algo_fns unsafe_fns; [...] }; That would mean that we'd have to deal with two pointers in your first patch, but I don't really think that's too terrible. And, since this approach doesn't result in an extra struct git_hash_algo, it's impossible to misuse hash_algo_by_ptr. I'm not a hard no on the current approach, but I think the above would probably be a little safer and harder to misuse. It would require extra patches to convert existing callers, though, but they aren't hugely numerous, and the conversion could probably be mostly automated. -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature