On 2024-11-06 at 01:38:48, Junio C Hamano wrote: > Taylor Blau <me@xxxxxxxxxxxx> writes: > > > -int cmd_hash_impl(int ac, const char **av, int algo) > > +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe) > > { > > git_hash_ctx ctx; > > unsigned char hash[GIT_MAX_HEXSZ]; > > @@ -27,7 +27,10 @@ int cmd_hash_impl(int ac, const char **av, int algo) > > die("OOPS"); > > } > > > > - algop->init_fn(&ctx); > > + if (unsafe) > > + algop->unsafe_init_fn(&ctx); > > + else > > + algop->init_fn(&ctx); > > It may be just me, and it would not matter all that much within the > context of the project because this is merely a test helper, but > giving a pair of init/unsafe_init methods to algop looks unnerving. > It gives an impression that every design of hash algorithm must come > with normal and unsafe variant, which is not what you want to say. > > I would have expected that there are different algorighm instances, > and one of them would be "unsafe SHA-1", among "normal SHA-1" and > "SHA-256" (as the last one would not even have unsafe_init_fn > method), and the callers that want to measure the performance of > each algorithm would simply pick one of these instances and go > through the usual "init", "update", "final" flow, regardless of the > "unsafe" ness of the algorithm. > > IOW, ... > > > while (1) { > > ssize_t sz, this_sz; > > @@ -46,9 +49,15 @@ int cmd_hash_impl(int ac, const char **av, int algo) > > } > > if (this_sz == 0) > > break; > > - algop->update_fn(&ctx, buffer, this_sz); > > + if (unsafe) > > + algop->unsafe_update_fn(&ctx, buffer, this_sz); > > + else > > + algop->update_fn(&ctx, buffer, this_sz); > > } > > - algop->final_fn(hash, &ctx); > > + if (unsafe) > > + algop->unsafe_final_fn(hash, &ctx); > > + else > > + algop->final_fn(hash, &ctx); > > ... I didn't expect any of these "if (unsafe) .. else .." switches. I don't think we can remove those, and here's why. Certainly Taylor can correct me if I'm off base, though. In the normal case, our hash struct is a union, and different implementations can have a different layout. A SHA-1 implementation will usually keep track of a 64-bit size, 5 32-bit words, and up to 64 bytes of data that might need to be processed. Maybe SHA-1-DC, our safe implementation, stores the size first, but our unsafe implementation is OpenSSL and it stores the 5 hash words first, or whatever. If we use the same update and final functions, we'll end up with incorrect data because we'll have initialized the union contents with data for one implementation but are trying to update or finalize it with different data, which in the very best case will just produce broken results, and might just cause a segfault (if one of the implementations stores a pointer instead of storing the data in the union). We could certainly adopt a different hash algorithm structure for safe and unsafe code, but we have a lot of places that assume that there's just one structure per algorithm. It would require a substantial amount of refactoring to remove those assumptions and deal with the fact that we now have two SHA-1 implementations. It would also be tricky to deal with the fact that for SHA-1, we might use the safe or unsafe algorithm, but for SHA-256, there's only one algorithm structure and we need to use it for both. -- brian m. carlson (they/them or he/him) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature