On Thu, Nov 07, 2024 at 12:48:26AM +0000, brian m. carlson wrote: > > I would have expected that there are different algorighm instances, > > and one of them would be "unsafe SHA-1", among "normal SHA-1" and > > "SHA-256" (as the last one would not even have unsafe_init_fn > > method), and the callers that want to measure the performance of > > each algorithm would simply pick one of these instances and go > > through the usual "init", "update", "final" flow, regardless of the > > "unsafe" ness of the algorithm. > [...] > > ... I didn't expect any of these "if (unsafe) .. else .." switches. > > I don't think we can remove those, and here's why. Certainly Taylor can > correct me if I'm off base, though. > > In the normal case, our hash struct is a union, and different > implementations can have a different layout. A SHA-1 implementation > will usually keep track of a 64-bit size, 5 32-bit words, and up to 64 > bytes of data that might need to be processed. Maybe SHA-1-DC, our safe > implementation, stores the size first, but our unsafe implementation > is OpenSSL and it stores the 5 hash words first, or whatever. > > If we use the same update and final functions, we'll end up with > incorrect data because we'll have initialized the union contents with > data for one implementation but are trying to update or finalize it with > different data, which in the very best case will just produce broken > results, and might just cause a segfault (if one of the implementations > stores a pointer instead of storing the data in the union). I don't think Junio is proposing mixing the functions on a single data type. Which, as you note, would be a disaster. I think the idea is for a separate git_hash_algo struct entirely, that can be slotted in as a pointer (since git_hash_algo is already essentially a virtual type). That gets weird as you say here: > We could certainly adopt a different hash algorithm structure for safe > and unsafe code, but we have a lot of places that assume that there's > just one structure per algorithm. It would require a substantial amount > of refactoring to remove those assumptions and deal with the fact that > we now have two SHA-1 implementations. It would also be tricky to deal > with the fact that for SHA-1, we might use the safe or unsafe algorithm, > but for SHA-256, there's only one algorithm structure and we need to use > it for both. both because we have two different algos for "sha1", but also because they are not _really_ independent. If the_hash_algo is sha1, then whichever implementation a given piece of code is using, it must still be one of the sha1 variants. So I think you wouldn't want to allocate an enum or a slot in the hash_algos array, because it is not really an independent algorithm. But I think it _could_ work as a separate struct that the caller derives from the main hash algorithm. For example, imagine that the git_hash_algo struct didn't have unsafe_init_fn, etc, but instead had: struct git_hash_algo *unsafe_implementation; with a matching accessor like: struct git_hash_algo *unsafe_hash_algo(struct git_hash_algo *algo) { /* if we have a faster "unsafe" implementation, use that */ if (algo->unsafe_implementation) return algo->unsafe_implementation; /* otherwise just use the default one */ return algo; } And then csum-file.c, rather than calling: the_hash_algo->unsafe_init_fn(...); ... the_hash_algo->unsafe_final_fn(...); would do this: struct git_hash_algo *algo = unsafe_hash_algo(the_hash_algo); algo->init_fn(...); ... algo->final_fn(...); And likewise this test helper would have a single conditional at the start: if (unsafe) algo = unsafe_hash_algo(algo); and the rest of the code would just use that. All that said, I do not think it buys us anything for "real" code. There the decision between safe/unsafe is in the context of how we are using it, and not based on some conditional. So while the code in this test helper is ugly, I don't think we'd ever write anything like that for real. So it may not be worth the effort to refactor into a more virtual object-oriented way. -Peff