On Tue, Sep 17, 2019 at 09:33:40AM +0200, Martin Steigerwald wrote: > Willy Tarreau - 17.09.19, 07:24:38 CEST: > > On Mon, Sep 16, 2019 at 06:46:07PM -0700, Matthew Garrett wrote: > > > >Well, the patch actually made getrandom() return en error too, but > > > >you seem more interested in the hypotheticals than in arguing > > > >actualities.> > > > If you want to be safe, terminate the process. > > > > This is an interesting approach. At least it will cause bug reports in > > application using getrandom() in an unreliable way and they will > > check for other options. Because one of the issues with systems that > > do not finish to boot is that usually the user doesn't know what > > process is hanging. > I would be happy with a change which changes getrandom(0) to send a kill -9 to the process if it is called too early, with a new flag, getrandom(GRND_BLOCK) which blocks until entropy is available. That leaves it up to the application developer to decide what behavior they want. Userspace applications which want to do something more sophisticated could set a timer which will cause getrandom(GRND_BLOCK) to return with EINTR (or the signal handler could use longjmp; whatever) to abort and do something else, like calling random_r if it's for some pathetic use of random numbers like MIT-MAGIC-COOKIE. > A userspace process could just poll on the kernel by forking a process > to use getrandom() and waiting until it does not get terminated anymore. > And then it would still hang. So.... I'm not too worried about that, because if a process is determined to do something stupid, they can always do something stupid. This could potentially be a problem, as would GRND_BLOCK, in that if an application author decides to use to do something to wait for real randomness, because in the good judgement of the application author, it d*mned needs real security because otherwise an attacker could, say, force a launch of nuclear weapons and cause world war III, and then some small 3rd-tier distro decides to repurpose that application for some other use, and puts it in early boot, it's possible that a user will report it as a "regression", and we'll be back to the question of whether we revert a performance optimization patch. There are only two ways out of this mess. The first option is we take functionality away from a userspace author who Really Wants A Secure Random Number Generator. And there are an awful lot of programs who really want secure crypto, becuase this is not a hypothetical. The result in "Mining your P's and Q's" did happen before. If we forget the history, we are doomed to repeat it. The only other way is that we need to try to get the CRNG initialized securely in early boot, before we let userspace start. If we do it early enough, we can also make the kernel facilities like KASLR and Stack Canaries more secure. And this is *doable*, at least for most common platforms. We can leverage UEFI; we cn try to use the TPM's random number generator, etc. It won't help so much for certain brain-dead architectures, like MIPS and ARM, but if they are used for embedded use cases, it will be caught before the product is released for consumer use. And this is where blocking is *way* better than a big fat warning, or sleeping for 15 seconds, both of which can easily get missed in the embedded case. If we can fix this for traditional servers/desktops/laptops, then users won't be complaining to Linus, and I think we can all be happy. Regards, - Ted