Re: [PATCH] random: use computational hash for entropy extraction

"Jason A. Donenfeld" <Jason@xxxxxxxxx> · Fri, 4 Feb 2022 14:24:07 +0100

Hi Eric,

Thanks for your comments.

On Fri, Feb 4, 2022 at 9:46 AM Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> What this patch does makes sense, but I'm having a hard time seeing how it maps
> to the paper cited above.  Your code seems to be treating BLAKE2s as an
> arbitrary-length PRF, but "Construction 8" in section B.1 of the paper is
> working with the raw compression function of a hash function.  Can you clarify?

When academics say "based on Merkle-Damgard", they just mean iterative
hashing. In this case, we have:

    refresh_F(s, x) = F(s, x), where s is the hash state, and x is a new input.

Every time new inputs are refreshed into it, they're compressed
together with the prior. In other words, "what a hash function does"
that broadly uses the Merkle-Damgard model, which BLAKE2s does. Modern
real hash functions also have a bit of extra book keeping - things
like finalization to prevent length extension and such - but these are
still considered MD-based hash functions. We're not going to excavate
the raw BLAKE2s compression function. In particular, I like that we
don't need to do anything funky and can just use something off the
shelf

Actually, though, I should have cited sections 5.1 and 6.4 (fixed now
for v2), which show a model of a next function for a full PRNG rather
than just a finalize for an extractor, which might pique your
interest, as it shows the instantiation of the new state (with F(s,
0)).

With regards to PRF vs a key-less hash function: for BLAKE2s, the
former reduces to the latter, in the sense that keyed BLAKE2s is the
same as ordinary BLAKE2s, except the first thing hashed in is the key
followed by 32 zero bytes (and the IV has a single bit change for
domain separation). We would be totally fine omitting those zeros --
from an entropy perspective they're obviously not adding anything --
but the fact that BLAKE2s is specified like this I think will make the
modeling a bit cleaner and easier. Practically speaking, it doesn't
really matter at all. About the most you could say is that we could
probably be *less* careful and do slightly *less* hashing and still
have a very good construction, but that I'd like to do a bit more.

>
> > -/*
> > - * Originally, we used a primitive polynomial of degree .poolwords
> > - * over GF(2).  The taps for various sizes are defined below.  They
> > - * were chosen to be evenly spaced except for the last tap, which is 1
> > - * to get the twisting happening as fast as possible.
> > - *
>
> The "Theory of operation" comment at the top of the file needs to be updated
> too.

Thanks, I just got rid of that quite outdated section and other things
that reference the old LFSR approach for v2.

There are so many other issues with the documentation comments in
random.c too, that I think I'll also work on a general documentation
cleanup patch at some point down the road. It still documents the old
/dev/random behavior, suggests that RDRAND is faster than ChaCha20,
forgets about getrandom(), and so on and so forth.

> In the above comment, 'key' should be 'seed'.
> Likewise above.

Caught this too right after sending. Fixed now for v2.

Regards,
Jason