Re: Linux 5.3-rc8

"Theodore Y. Ts'o" <tytso@xxxxxxx> · Wed, 11 Sep 2019 13:36:24 -0400

On Wed, Sep 11, 2019 at 06:00:19PM +0100, Linus Torvalds wrote:
>     [    0.231255] random: get_random_bytes called from
> start_kernel+0x323/0x4f5 with crng_init=0
> 
> and that's this code:
> 
>         add_latent_entropy();
>         add_device_randomness(command_line, strlen(command_line));
>         boot_init_stack_canary();
> 
> in particular, it's the boot_init_stack_canary() thing that asks for a
> random number for the canary.
> 
> I don't actually see the 'crng init done' until much much later:
> 
>     [   21.741125] random: crng init done

Yes, that's super early in the boot sequence.  IIRC the stack canary
gets reinitialized later (or maybe it was only for the other CPU's in
SMP mode; I don't recall the details of the top of my head).

I think this one always fails, and perhaps we should have a way of
suppressing it --- but that's correct the in-kernel interface doesn't
block.

The /dev/urandom device doesn't block either, despite security
eggheads continually asking me to change it to block ala getrandom(2),
but I have always pushed because because I *know* changing
/dev/urandom to block would be asking for userspace regressions.

The compromise we came up with was that since getrandom(2) is a new
interface, we could make this have the behavior that the security
heads wanted, which is to make blocking unconditional, since the
theory was that *this* interface would be sane, and that userspace
applications which used it too early was buggy, and we could make it
*their* problem.

People have suggested adding a new getrandom flag, GRND_I_KNOW_THIS_IS_INSECURE,
or some such, which wouldn't block and would return "best efforts"
randomness.  I haven't been super enthusiastic about such a flag
because I *know* it would be insecure.   However, the next time a massive
security bug shows up on the front pages of the Wall Street Journal,
or on some web site such as https://factorable.net, it won't be the kernel's fault
since the flag will be GRND_INSECURE_BROKEN_APPLICATION, or some such.
It doesn't really solve the problem, though.

> But this does show that
> 
>  (a) we have the same issue in the kernel, and we don't block there

Ultimately, I think the only right answer is to make it the
bootloader's responsibility to get us some decent entropy at boot
time.  There are patches to allow ARM systems to pass in entropy via
the device tree.  And in theory (assuming you trust the UEFI BIOS ---
stop laughing in the back!) we can use that get entropy which will
solve the problem for UEFI boot systems.  I've been talking to Ron
Minnich about trying to get this support into the NERF bootloader, at
which point new servers from the Open Compute Project will have a
solution as well.  (We can probably also get solutions for Chrome OS
devices, since those have TPM-like which are trusted to have a
comptently engineered hardware RNG --- I'm not sure I would trust all
TPM devices in commodity hardware, but again, at least we can shift
blame off of the kernel.  :-P)

Still, these are all point solutions, and don't really solve the
problem on older systems, or non-x86 systems.

>  (b) initializing the crng really can be a timing problem
> 
> The interrupt thing is only going to get worse as disks turn into
> ssd's and some of them end up using polling rather than interrupts..
> So we're likely to see _fewer_ interrupts in the future, not more.

Yeah, agreed.  Maybe we should have an "insecure_randomness" boot
option which blindly forces the CRNG to be initialized at boot, so
that at least people can get to a command line, if insecurely?  I
don't have any good ideas about how to solve this problem in general.
:-( :-( :-(

						- Ted