Re: [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode

"brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> · Thu, 7 Nov 2024 00:48:26 +0000

On 2024-11-06 at 01:38:48, Junio C Hamano wrote:
> Taylor Blau <me@xxxxxxxxxxxx> writes:
> 
> > -int cmd_hash_impl(int ac, const char **av, int algo)
> > +int cmd_hash_impl(int ac, const char **av, int algo, int unsafe)
> >  {
> >  	git_hash_ctx ctx;
> >  	unsigned char hash[GIT_MAX_HEXSZ];
> > @@ -27,7 +27,10 @@ int cmd_hash_impl(int ac, const char **av, int algo)
> >  			die("OOPS");
> >  	}
> >  
> > -	algop->init_fn(&ctx);
> > +	if (unsafe)
> > +		algop->unsafe_init_fn(&ctx);
> > +	else
> > +		algop->init_fn(&ctx);
> 
> It may be just me, and it would not matter all that much within the
> context of the project because this is merely a test helper, but
> giving a pair of init/unsafe_init methods to algop looks unnerving.
> It gives an impression that every design of hash algorithm must come
> with normal and unsafe variant, which is not what you want to say.
> 
> I would have expected that there are different algorighm instances,
> and one of them would be "unsafe SHA-1", among "normal SHA-1" and
> "SHA-256" (as the last one would not even have unsafe_init_fn
> method), and the callers that want to measure the performance of
> each algorithm would simply pick one of these instances and go
> through the usual "init", "update", "final" flow, regardless of the
> "unsafe" ness of the algorithm.
> 
> IOW, ...
> 
> >  	while (1) {
> >  		ssize_t sz, this_sz;
> > @@ -46,9 +49,15 @@ int cmd_hash_impl(int ac, const char **av, int algo)
> >  		}
> >  		if (this_sz == 0)
> >  			break;
> > -		algop->update_fn(&ctx, buffer, this_sz);
> > +		if (unsafe)
> > +			algop->unsafe_update_fn(&ctx, buffer, this_sz);
> > +		else
> > +			algop->update_fn(&ctx, buffer, this_sz);
> >  	}
> > -	algop->final_fn(hash, &ctx);
> > +	if (unsafe)
> > +		algop->unsafe_final_fn(hash, &ctx);
> > +	else
> > +		algop->final_fn(hash, &ctx);
> 
> ... I didn't expect any of these "if (unsafe) .. else .." switches.

I don't think we can remove those, and here's why.  Certainly Taylor can
correct me if I'm off base, though.

In the normal case, our hash struct is a union, and different
implementations can have a different layout.  A SHA-1 implementation
will usually keep track of a 64-bit size, 5 32-bit words, and up to 64
bytes of data that might need to be processed.  Maybe SHA-1-DC, our safe
implementation, stores the size first, but our unsafe implementation
is OpenSSL and it stores the 5 hash words first, or whatever.

If we use the same update and final functions, we'll end up with
incorrect data because we'll have initialized the union contents with
data for one implementation but are trying to update or finalize it with
different data, which in the very best case will just produce broken
results, and might just cause a segfault (if one of the implementations
stores a pointer instead of storing the data in the union).

We could certainly adopt a different hash algorithm structure for safe
and unsafe code, but we have a lot of places that assume that there's
just one structure per algorithm.  It would require a substantial amount
of refactoring to remove those assumptions and deal with the fact that
we now have two SHA-1 implementations.  It would also be tricky to deal
with the fact that for SHA-1, we might use the safe or unsafe algorithm,
but for SHA-256, there's only one algorithm structure and we need to use
it for both.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA
Attachment:
signature.asc

Description: PGP signature