On Thu, Feb 17, 2022 at 12:21:42AM +0100, Jason A. Donenfeld wrote: > Rather than the clunky NUMA full ChaCha state system we had prior, this > commit is closer to the original "fast key erasure RNG" proposal from > <https://blog.cr.yp.to/20170723-random.html>, by simply treating ChaCha > keys on a per-cpu basis. > > All entropy is extracted to a base crng key of 32 bytes. This base crng > has a birthdate and a generation counter. When we go to take bytes from > the crng, we first check if the birthdate is too old; if it is, we > reseed per usual. Then we start working on a per-cpu crng. > > This per-cpu crng makes sure that it has the same generation counter as > the base crng. If it doesn't, it does fast key erasure with the base > crng key and uses the output as its new per-cpu key, and then updates > its local generation counter. Then, using this per-cpu state, we do > ordinary fast key erasure. Half of this first block is used to overwrite > the per-cpu crng key for the next call -- this is the fast key erasure > RNG idea -- and the other half, along with the ChaCha state, is returned > to the caller. If the caller desires more than this remaining half, it > can generate more ChaCha blocks, unlocked, using the now detached ChaCha > state that was just returned. Crypto-wise, this is more or less what we > were doing before, but this simply makes it more explicit and ensures > that we always have backtrack protection by not playing games with a > shared block counter. > > The flow looks like this: > > ──extract()──► base_crng.key ◄──memcpy()───┐ > │ │ > └──chacha()──────┬─► new_base_key > └─► crngs[n].key ◄──memcpy()───┐ > │ │ > └──chacha()───┬─► new_key > └─► random_bytes > │ > └────► > > There are a few hairy details around early init. Just as was done > before, prior to having gathered enough entropy, crng_fast_load() and > crng_slow_load() dump bytes directly into the base crng, and when we go > to take bytes from the crng, in that case, we're doing fast key erasure > with the base crng rather than the fast unlocked per-cpu crngs. This is > fine as that's only the state of affairs during very early boot; once > the crng initializes we never use these paths again. > > In the process of all this, the APIs into the crng become a bit simpler: > we have get_random_bytes(buf, len) and get_random_bytes_user(buf, len), > which both do what you'd expect. All of the details of fast key erasure > and per-cpu selection happen only in a very short critical section of > crng_make_state(), which selects the right per-cpu key, does the fast > key erasure, and returns a local state to the caller's stack. So, we no > longer have a need for a separate backtrack function, as this happens > all at once here. The API then allows us to extend backtrack protection > to batched entropy without really having to do much at all. > > The result is a bit simpler than before and has fewer foot guns. The > init time state machine also gets a lot simpler as we don't need to wait > for workqueues to come online and do deferred work. And the multi-core > performance should be increased significantly, by virtue of having hardly > any locking on the fast path. > > Cc: Theodore Ts'o <tytso@xxxxxxx> > Cc: Dominik Brodowski <linux@xxxxxxxxxxxxxxxxxxxx> > Cc: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > Reviewed-by: Jann Horn <jannh@xxxxxxxxxx> > Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx> > --- > Changes v3->v4: > - Following Jann's review, base_crng.birth is now written to with > WRITE_ONCE. > > drivers/char/random.c | 388 ++++++++++++++++++++++++------------------ > 1 file changed, 222 insertions(+), 166 deletions(-) Looks good, Reviewed-by: Eric Biggers <ebiggers@xxxxxxxxxx> The only oddity I noticed is that some new comments use the net coding style for multi-line comments, and get reformatted to the standard style later in a later patch. It would be preferable to use the standard style from the beginning. - Eric