On Wed, Sep 11, 2019 at 06:00:19PM +0100, Linus Torvalds wrote: > [ 0.231255] random: get_random_bytes called from > start_kernel+0x323/0x4f5 with crng_init=0 > > and that's this code: > > add_latent_entropy(); > add_device_randomness(command_line, strlen(command_line)); > boot_init_stack_canary(); > > in particular, it's the boot_init_stack_canary() thing that asks for a > random number for the canary. > > I don't actually see the 'crng init done' until much much later: > > [ 21.741125] random: crng init done Yes, that's super early in the boot sequence. IIRC the stack canary gets reinitialized later (or maybe it was only for the other CPU's in SMP mode; I don't recall the details of the top of my head). I think this one always fails, and perhaps we should have a way of suppressing it --- but that's correct the in-kernel interface doesn't block. The /dev/urandom device doesn't block either, despite security eggheads continually asking me to change it to block ala getrandom(2), but I have always pushed because because I *know* changing /dev/urandom to block would be asking for userspace regressions. The compromise we came up with was that since getrandom(2) is a new interface, we could make this have the behavior that the security heads wanted, which is to make blocking unconditional, since the theory was that *this* interface would be sane, and that userspace applications which used it too early was buggy, and we could make it *their* problem. People have suggested adding a new getrandom flag, GRND_I_KNOW_THIS_IS_INSECURE, or some such, which wouldn't block and would return "best efforts" randomness. I haven't been super enthusiastic about such a flag because I *know* it would be insecure. However, the next time a massive security bug shows up on the front pages of the Wall Street Journal, or on some web site such as https://factorable.net, it won't be the kernel's fault since the flag will be GRND_INSECURE_BROKEN_APPLICATION, or some such. It doesn't really solve the problem, though. > But this does show that > > (a) we have the same issue in the kernel, and we don't block there Ultimately, I think the only right answer is to make it the bootloader's responsibility to get us some decent entropy at boot time. There are patches to allow ARM systems to pass in entropy via the device tree. And in theory (assuming you trust the UEFI BIOS --- stop laughing in the back!) we can use that get entropy which will solve the problem for UEFI boot systems. I've been talking to Ron Minnich about trying to get this support into the NERF bootloader, at which point new servers from the Open Compute Project will have a solution as well. (We can probably also get solutions for Chrome OS devices, since those have TPM-like which are trusted to have a comptently engineered hardware RNG --- I'm not sure I would trust all TPM devices in commodity hardware, but again, at least we can shift blame off of the kernel. :-P) Still, these are all point solutions, and don't really solve the problem on older systems, or non-x86 systems. > (b) initializing the crng really can be a timing problem > > The interrupt thing is only going to get worse as disks turn into > ssd's and some of them end up using polling rather than interrupts.. > So we're likely to see _fewer_ interrupts in the future, not more. Yeah, agreed. Maybe we should have an "insecure_randomness" boot option which blindly forces the CRNG to be initialized at boot, so that at least people can get to a command line, if insecurely? I don't have any good ideas about how to solve this problem in general. :-( :-( :-( - Ted