Re: [PATCH v2 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants

Jeff King <peff@xxxxxxxx> · Fri, 10 Jan 2025 21:42:03 -0500

On Fri, Jan 10, 2025 at 04:29:38PM -0500, Taylor Blau wrote:

> > I didn't see any response to the review in round 1 about the pointer
> > dangers in patch 3. What do you think of using a separate
> > git_hash_algo_fns struct, with the one-time conversion I showed in the
> > subthread of:
> >
> >   https://lore.kernel.org/git/20241121093731.GD602681@xxxxxxxxxxxxxxxxxxxxxxx/
> >
> > ?
> 
> Oops. I must have missed those messages; and sure enough when focusing
> my inbox on this thread they are indeed unread :-).
> 
> That said, I am not sure that that direction is one that I'd want to go
> in. Part of the goal of this series is to make it possible to mix safe
> and unsafe function calls on the same hash function. So doing something
> like:
> 
>     struct git_hash_algo *algop;
> 
>     algop->init_fn(&ctx);
> 
> in one part of the code, and then (using the same algop) calling:
> 
>     algop->unsafe_final_fn(...);
> 
> should be impossible to do to. The benefit of having only a single set
> of functions implemented on the git_hash_algo type is that it is
> impossible to mix the two: you'd have to use a different git_hash_algo
> altogether!
> 
> So porting the above example to your and brian's git_hash_algo_fns
> struct, you'd still be able to do:
> 
>     algop->fn.init(&ctx);
> 
> in one part of the code and algop->unsafe_fn.final(...) in another part,
> which doesn't appear to me to be safer than the current situation that
> this series aims to solve.

I think what that proposal is doing is orthogonal to the goal of your
series. You'd still have an unsafe_hash_algo() function, but it would
return a git_hash_algo_fns struct, and that's what struct hashfile would
store. So your patches would still remain.

The advantage is mostly that you can't confuse it with a regular
git_hash_algo struct, so it avoids the pointer and hash-id issues.

I do think there is one gotcha, though, which is that the hashfile code
probably still needs the outer algop pointer for things like
algop->raw_size. So you'd have to store both.

It's _possible_ to still confuse the two, but the idea is that you'd
have to explicitly call algop->fn, to get the wrong one there. If we
wanted to make that harder to get wrong, we could start making it a
habit to never use algo->fn directly, but to ask for the safe/unsafe
git_hash_algo_fns struct. But that would be even more churn in the
surrounding code. I think just doing it consistently within hashfile
(which is the only unsafe user) would be sufficient.

-Peff