Re: [PATCH] arch/s390/crypto/prng: Stop being stupidly wasteful with /dev/urandom

George Spelvin <lkml@xxxxxxx> · Wed, 3 Apr 2019 18:51:59 GMT

(Apoogies for preceding mangled mail.  I lost my connection
to my mail host and it has a bad habit of sending the current
edit buffer when I disconnect unexpectedy.)

On Wed, 3 Apr 2019 at 12:51:48 +0200, Harald Freudenberger wrote
> Then someone explained to me that a sha256 can never produce 256 bits of
> entropy as there may exist collisions. Someone must assume that the output
> of sha256 will have 255 bits entropy at most. However, I decided to double
> all the buffer and use sha512 to be on the save side and be able to hold the
> statement about the 256 bits entropy within the 32 bytes of random produced.

I just spent a couple of hours writing a long explanation of the
statistics of random functions and the effect of collisions on
entropy before I started wondering what the seed material was actually
used for and traced the code back through dz9zr010.pdf, which told me
this whole thing is just an implementation of NIST SP800-90A!

(Specifically, it's an implementation of Hash_DRBG using SHA-512 as
a base.)

That changes lots of things.  There's no upper limit on seed size
(okay, 2^35 bytes), but 256 is a desired *lower* limit on seed entropy.

Things make a lot more sense now.  Whenever you want n bits of entropy,
you want to avoid bottlenecks.  There's a wide uncertainty in how
much entropy is in seed material.  You try for a conservative lower
bound, but what you actually care about is an attacker's uncertainty
about the key material, and that's subjective.

So it's actually an important design criterion to allow lots of headroom
in entropy buffer sizes.  Don't even *try* to store n bits of entropy
in an n-bit buffer, but allow at least a 2:1 margin.

This can be seen in the Hash_DRBG design.  It wants 256 bits of
entropy, but uses an 888-bit "seed length" internally.  (It actually
uses two such variables, V and C, but you can see that C is
regenerated from V each seeding, so there's an 888-bit botteneck.)

So my initial opinion is that you should just take the entire 8K
of timestamps *plus* the get_random_bytes output and use that for
seed material.  It'll get hashed twice to generate the internal
state V, but that's no biggie.

However!  The z/Arch PPNO instruction has a limit of 512 bytes on
the seed material, so we can't do that.  Some manual compression
is required.

Your idea of using SHA-512 is fine.

The two main things the current code does wrong are:
- There's no reason to compress things by XORing get_random_bytes()
  and SHA-512 output together.  Just concatenate them in the seed
  buffer.
- Calling generate_entropy with a length not a multiple of 64 bytes
  is an actively bad thing.  You're generating seed timestamps with
  *at least* 256 bits of entropy, then (in prng_sha512_instantiate())
  only using *at most* 256 bits of that.  That's just wasteful.
  Use the whole 512 bytes of SHA-512 output you just computed.

What you should do to instantiate is:
- Allocate a 64 + 64 + 48-byte buffer (512 + 512 + 384 bits).
- Generate a bunch of timestamps and hash them to generate
  the first 512 bits.
- Do it again for the second bunch.  (If you want only 384 bits
  of seed, generate fewer timestamps, but don't use less hash
  output.)
- Call get_random_bytes() for the last 48 bytes.  (Or whatever
  you think a reasonable security parameter is.  32 bytes is
  fine, too, but there's no harm in a little more *because the DRBG
  has enough state space to store the excess*.)
- Feed the entire 176-byte buffer to CPACF_PRNO_SHA512_DRNG_SEED.

There are other slight permutations of this, but that's the basic
idea.  It would actually be better to generate all the timestamps
in one big buffer and hash them twice (with different starting
state values; the output of pass #1 will do fine for the start
of pass #2) rather than independent passes, but the buffer is
already inconveniently large.

But!  You could get creative with KIMD and maintain two states (you
have the buffer space for it, after all), generate timestamps a
page at a time, and hash the pages one at a time.  I don't know
the startup cost of KIMD and how much you save by providing a
contiguous buffer.

One thing I note is that you don't bother initializing the hash[]
array in generate_entropy() before using it as the SHA-512 IV.
Not bad, but it deserves a comment.  As does the fact that you don't
finalize the hash.