Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()

Andy Lutomirski <luto@xxxxxxxxxx> · Fri, 20 Sep 2019 10:52:30 -0700

On Fri, Sep 20, 2019 at 9:30 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Fri, Sep 20, 2019 at 7:34 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> >
> > What is this GRND_EXPLICIT thing?
>
> Your own email gives the explanation:
>
> > Linus, I disagree that blocking while waiting for randomness is an
> > error.  Sometimes you want to generate a key
>
> That's *exactly* why GRND_EXPLICIT needs to be done regardless.
>
> The keyword there is "Sometimes".
>
> But people currently use "getrandom(0)" when they DO NOT want a key,
> they just want some miscellaneous random numbers for some totally
> non-security-related reason.
>
> And that will continue. Exactly because the people who do not want a
> key by definition aren't thinking about it very hard.

I fully agree that this is a problem.  It's a problem we brought on
ourselves because we screwed up the ABI from the beginning.  The
question is what to do about it that doesn't cause its own set of
nasty problems.

> So GRND_EXPLICIT is there very much to make sure people who want true
> secure keys will say so, and five years from now we will not have the
> confusion between "Oh, I wasn't thinking about bootup". Because at a
> minimum, in the near future getrandom(0) will warn about the
> ambiguity. Or it will use some questionable jitter entropy that some
> real key users will look at sideways and go "I don't want that".

There are programs that call getrandom(0) *today* that expect secure
output.  openssl does a horrible dance in which it calls getentropy()
if available and falls back to syscall(__NR_getrandom, buf, buflen, 0)
otherwise.  We can't break this use case.  Changing the semantics of
getrandom(0) out from under them seems like the worst kind of ABI
break -- existing applications will *appear* to continue working but
will, in fact, become insecure.

IMO, from the beginning, we should have done this:

GRND_INSECURE: insecure.  always works.

GRND_SECURE_BLOCKING: does exactly what it says.

0: -EINVAL.

Using it correctly would be obvious.  Something like GRND_EXPLICIT
would be a head-scratcher: people would have to look at the man page
and actually think about it, and it's still easy to get wrong:

getrandom(..., GRND_EXPLICIT): just fscking give me a number.  it
seems to work and it shuts up the warning

And we're back to square one.

I think that, given existing software, we should make two or three
changes to fix the basic problems here:

1. Add GRND_INSECURE: at least let new applications do the right thing
going forward.

2. Fix what is arguably a straight up kernel bug, not even an ABI
issue: when a user program is blocking in getrandom(..., 0), the
kernel happily sits there doing absolutely nothing and deadlocks the
system as a result.  This IMO isn't an ABI issue -- it's an
implementation problem.  How about we make getrandom() (probably
actually wait_for_random_bytes()) do something useful to try to seed
the RNG if the system is otherwise not doing IO.

3. Optionally, entirely in user code: Get glibc to add new *library*
functions: getentropy_secure_blocking() and getentropy_insecure() or
whatever they want to call them.  Deprecate getentropy().

I think #2 is critical.  Right now, suppose someone has a system that
neets to do a secure network request (a la Red Hat's Clevis).  I have
no idea what Clevis actually does, but it wouldn't be particularly
crazy to do a DH exchange or sign with an EC key to ask some network
server to help unlock a dm-crypt volume.  If the system does this at
boot, it needs to use getrandom(..., 0), GRND_EXPLICIT, or whatever,
because it NEEDS a secure random number.  No about of ABI fiddling
will change this.  The kernel should *work* in this case rather than
deadlocking.

--Andy