Re: Linux 5.3-rc8

"Ahmed S. Darwish" <darwish.07@xxxxxxxxx> · Sat, 14 Sep 2019 23:11:26 +0200

Hi,

On Sat, Sep 14, 2019 at 09:30:19AM -0700, Linus Torvalds wrote:
> On Sat, Sep 14, 2019 at 8:02 AM Ahmed S. Darwish <darwish.07@xxxxxxxxx> wrote:
> >
> > On Thu, Sep 12, 2019 at 12:34:45PM +0100, Linus Torvalds wrote:
> > >
> > > An alternative might be to make getrandom() just return an error
> > > instead of waiting. Sure, fill the buffer with "as random as we can"
> > > stuff, but then return -EINVAL because you called us too early.
> >
> > ACK, that's probably _the_ most sensible approach. Only caveat is
> > the slight change in user-space API semantics though...
> >
> > For example, this breaks the just released systemd-random-seed(8)
> > as it _explicitly_ requests blocking behvior from getrandom() here:
> >
> 
> Actually, I would argue that the "don't ever block, instead fill
> buffer and return error instead" fixes this broken case.
> 
> >     => src/random-seed/random-seed.c:
> >     /*
> >      * Let's make this whole job asynchronous, i.e. let's make
> >      * ourselves a barrier for proper initialization of the
> >      * random pool.
> >      */
> >      k = getrandom(buf, buf_size, GRND_NONBLOCK);
> >      if (k < 0 && errno == EAGAIN && synchronous) {
> >          log_notice("Kernel entropy pool is not initialized yet, "
> >                     "waiting until it is.");
> >
> >          k = getrandom(buf, buf_size, 0); /* retry synchronously */
> >      }
> 
> Yeah, the above is yet another example of completely broken garbage.
> 
> You can't just wait and block at boot. That is simply 100%
> unacceptable, and always has been, exactly because that may
> potentially mean waiting forever since you didn't do anything that
> actually is likely to add any entropy.
>

ACK, the systemd commit which introduced that code also does:

   => 26ded5570994 (random-seed: rework systemd-random-seed.service..)
    [...]
    --- a/units/systemd-random-seed.service.in
    +++ b/units/systemd-random-seed.service.in
    @@ -22,4 +22,9 @@ Type=oneshot
    RemainAfterExit=yes
    ExecStart=@rootlibexecdir@/systemd-random-seed load
    ExecStop=@rootlibexecdir@/systemd-random-seed save
   -TimeoutSec=30s
   +
   +# This service waits until the kernel's entropy pool is
   +# initialized, and may be used as ordering barrier for service
   +# that require an initialized entropy pool. Since initialization
   +# can take a while on entropy-starved systems, let's increase the
   +# time-out substantially here.
   +TimeoutSec=10min

This 10min wait thing is really broken... it's basically "forever".

> >      if (k < 0) {
> >          log_debug_errno(errno, "Failed to read random data with "
> >                          "getrandom(), falling back to "
> >                          "/dev/urandom: %m");
> 
> At least it gets a log message.
> 
> So I think the right thing to do is to just make getrandom() return
> -EINVAL, and refuse to block.
> 
> As mentioned, this has already historically been a huge issue on
> embedded devices, and with disks turnign not just to NVMe but to
> actual polling nvdimm/xpoint/flash, the amount of true "entropy"
> randomness we can give at boot is very questionable.
>

ACK.

Moreover, and as a result of all that, distributions are now officially
*duct-taping* the problem:

    https://www.debian.org/releases/buster/amd64/release-notes/ch-information.en.html#entropy-starvation

    5.1.4. Daemons fail to start or system appears to hang during boot

    Due to systemd needing entropy during boot and the kernel treating
    such calls as blocking when available entropy is low, the system
    may hang for minutes to hours until the randomness subsystem is
    sufficiently initialized (random: crng init done).

"the system may hang for minuts to hours"...

> We can (and will) continue to do a best-effort thing (including very
> much using rdread and friends), but the whole "wait for entropy"
> simply *must* stop.
> 
> > I've sent an RFC patch at [1].
> >
> > [1] https://lkml.kernel.org/r/20190914122500.GA1425@darwi-home-pc
> 
> Looks reasonable to me. Except I'd just make it simpler and make it a
> big WARN_ON_ONCE(), which is a lot harder to miss than pr_notice().
> Make it clear that it is a *bug* if user space thinks it should wait
> at boot time.
> 
> Also, we might even want to just fill the buffer and return 0 at that
> point, to make sure that even more broken user space doesn't then try
> to sleep manually and turn it into a "I'll wait myself" loop.
>

ACK, I'll send an RFC v2, returning buflen, and so on..

/me will enjoy the popcorn from all the to-be-reported WARN_ON()s
on distribution mailing lists ;-)

>                  Linus

thanks,

-- 
darwi
http://darwish.chasingpointers.com