Re: [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode

Taylor Blau <me@xxxxxxxxxxxx> · Thu, 7 Nov 2024 16:30:37 -0500

On Thu, Nov 07, 2024 at 01:49:35AM +0000, brian m. carlson wrote:
> On 2024-11-07 at 01:39:15, Jeff King wrote:
> > So I think you wouldn't want to allocate an enum or a slot in the
> > hash_algos array, because it is not really an independent algorithm.
> > But I think it _could_ work as a separate struct that the caller derives
> > from the main hash algorithm. For example, imagine that the
> > git_hash_algo struct didn't have unsafe_init_fn, etc, but instead had:
> >
> >   struct git_hash_algo *unsafe_implementation;
> >
> > with a matching accessor like:
> >
> >   struct git_hash_algo *unsafe_hash_algo(struct git_hash_algo *algo)
> >   {
> > 	/* if we have a faster "unsafe" implementation, use that */
> > 	if (algo->unsafe_implementation)
> > 		return algo->unsafe_implementation;
> > 	/* otherwise just use the default one */
> > 	return algo;
> >   }
> >
> > And then csum-file.c, rather than calling:
> >
> >   the_hash_algo->unsafe_init_fn(...);
> >   ...
> >   the_hash_algo->unsafe_final_fn(...);
> >
> > would do this:
> >
> >   struct git_hash_algo *algo = unsafe_hash_algo(the_hash_algo);
> >   algo->init_fn(...);
> >   ...
> >   algo->final_fn(...);
> >
> > And likewise this test helper would have a single conditional at the
> > start:
> >
> >   if (unsafe)
> > 	algo = unsafe_hash_algo(algo);
> >
> > and the rest of the code would just use that.
>
> Ah, yes, I think that approach would be simpler and nicer to work with
> than a separate entry in the `hash_algos` array.  We do, however, have
> some places that assume that a `struct git_hash_algo *` points into the
> `hash_algos` array (notably `hash_algo_by_ptr`), so we'd have to adjust
> for that, move the function pointers out into their own struct which
> we'd use for `unsafe_hash_algo`, or be careful never to call the
> relevant functions on our special `git_hash_algo` struct.

I agree that the approach suggested by Peff here is clean and would
button up the test-helper code nicely. It may be worth doing in the
future, but I agree that it doesn't seem entirely in scope here.

Part of the reason that I did not initially add a separate sha1-unsafe
struct is that while it avoids this awkwardness, it creates new
awkwardness when in a non-SHA-1 repository, where you have to keep
track of whether you want to use the_unsafe_hash_algo or the SHA-256
hash algo.

In that instance, it's not a matter of computing the same result two
different ways, but rather using the wrong hash function entirely. IOW,
the hashfile API would have to realize that when in SHA-256 mode, that
it should *not* use the separate (hypothetical) unsafe SHA-1 struct.

> > All that said, I do not think it buys us anything for "real" code. There
> > the decision between safe/unsafe is in the context of how we are using
> > it, and not based on some conditional. So while the code in this test
> > helper is ugly, I don't think we'd ever write anything like that for
> > real. So it may not be worth the effort to refactor into a more virtual
> > object-oriented way.
>
> Yeah, I don't have a strong opinion one way or the other.  I think, with
> the limitation I mentioned above, it would probably require a decent
> amount of refactoring if we took a different approach, and I'm fine with
> going with Taylor's current approach unless he wants to do that
> refactoring (in which case, great).

I think it does buy you something for real code, which is that you don't
have to remember to consistently call the unsafe_ variants of all of the
various function pointers.

For instance, if you do

    the_hash_algo->unsafe_init_fn(...);

early on, and then later on by mistake write:

    the_hash_algo->update_fn(...);

Then your code is broken and will (as brian said) either in the best
case produce wrong results, or likely segfault.

So in that sense it is worth doing as it avoids the caller from having
to keep track of what "mode" they are using the_hash_algo in. But let's
take it up later, I think.

Thanks,
Taylor