Hi Eric, Thanks for your comments. On Fri, Feb 4, 2022 at 9:46 AM Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > What this patch does makes sense, but I'm having a hard time seeing how it maps > to the paper cited above. Your code seems to be treating BLAKE2s as an > arbitrary-length PRF, but "Construction 8" in section B.1 of the paper is > working with the raw compression function of a hash function. Can you clarify? When academics say "based on Merkle-Damgard", they just mean iterative hashing. In this case, we have: refresh_F(s, x) = F(s, x), where s is the hash state, and x is a new input. Every time new inputs are refreshed into it, they're compressed together with the prior. In other words, "what a hash function does" that broadly uses the Merkle-Damgard model, which BLAKE2s does. Modern real hash functions also have a bit of extra book keeping - things like finalization to prevent length extension and such - but these are still considered MD-based hash functions. We're not going to excavate the raw BLAKE2s compression function. In particular, I like that we don't need to do anything funky and can just use something off the shelf Actually, though, I should have cited sections 5.1 and 6.4 (fixed now for v2), which show a model of a next function for a full PRNG rather than just a finalize for an extractor, which might pique your interest, as it shows the instantiation of the new state (with F(s, 0)). With regards to PRF vs a key-less hash function: for BLAKE2s, the former reduces to the latter, in the sense that keyed BLAKE2s is the same as ordinary BLAKE2s, except the first thing hashed in is the key followed by 32 zero bytes (and the IV has a single bit change for domain separation). We would be totally fine omitting those zeros -- from an entropy perspective they're obviously not adding anything -- but the fact that BLAKE2s is specified like this I think will make the modeling a bit cleaner and easier. Practically speaking, it doesn't really matter at all. About the most you could say is that we could probably be *less* careful and do slightly *less* hashing and still have a very good construction, but that I'd like to do a bit more. > > > -/* > > - * Originally, we used a primitive polynomial of degree .poolwords > > - * over GF(2). The taps for various sizes are defined below. They > > - * were chosen to be evenly spaced except for the last tap, which is 1 > > - * to get the twisting happening as fast as possible. > > - * > > The "Theory of operation" comment at the top of the file needs to be updated > too. Thanks, I just got rid of that quite outdated section and other things that reference the old LFSR approach for v2. There are so many other issues with the documentation comments in random.c too, that I think I'll also work on a general documentation cleanup patch at some point down the road. It still documents the old /dev/random behavior, suggests that RDRAND is faster than ChaCha20, forgets about getrandom(), and so on and so forth. > In the above comment, 'key' should be 'seed'. > Likewise above. Caught this too right after sending. Fixed now for v2. Regards, Jason