On 2/11/19 7:13 PM, James Bottomley wrote: > On Mon, 2019-02-11 at 09:31 -0700, Jens Axboe wrote: >> On 2/11/19 9:28 AM, James Bottomley wrote: >>> On Mon, 2019-02-11 at 08:46 -0700, Jens Axboe wrote: >>>> On 2/11/19 8:42 AM, James Bottomley wrote: >>>>> On Mon, 2019-02-11 at 08:28 -0700, Jens Axboe wrote: >>>>>> On 2/11/19 8:25 AM, James Bottomley wrote: >>>>>>> On Sun, 2019-02-10 at 09:35 -0700, Jens Axboe wrote: >>>>>>>> On 2/10/19 9:25 AM, James Bottomley wrote: >>> >>> [...] >>>>>>>>> That check wasn't changed by the code removal. >>>>>>>> >>>>>>>> As I said above, for sd. This isn't true for non-disks. >>>>>>> >>>>>>> Yes, but the behaviour above doesn't change across a switch >>>>>>> to MQ, so I don't quite understand how it bisects back to >>>>>>> that change. If we're not gathering entropy for the device >>>>>>> now, we wouldn't have been before the switch, so the >>>>>>> entropy characteristics shouldn't have changed. >>>>>> >>>>>> But it does, as I also wrote in that first email. The legacy >>>>>> queue flags had QUEUE_FLAG_ADD_RANDOM set by default, the MQ >>>>>> ones do not. Hence any non-sd device would previously ALWAYS >>>>>> have ADD_RANDOM set, now none of them do. Also see the patch >>>>>> I sent. >>>>> >>>>> So your theory is that the disk in question never gets to the >>>>> rotational check? because the check will clear the flag if >>>>> it's non-rotational and set it if it's not, so the default >>>>> state of the flag shouldn't matter. >>>> >>>> No, my point is about non-disks, devices that aren't driven by >>>> sd. The behavior for sd hasn't changed, as it sets/clears it >>>> unconditionally. >>> >>> I agree, but I don't think any of them were significant entropy >>> contributors before: things like nvme have always been outside of >>> this and sr and st don't really contribute much to the seek load >>> during boot because they're probed but not used by the boot >>> sequence, so I can't see how they would cause this behaviour. I >>> suppose it could be target probing, but even that seems unlikely >>> because it should be dwarfed by the number of root disk reads >>> during boot. >>> >>> For the rng to take an additional 5 minutes to initialize, we must >>> have lost a significant entropy source somewhere. >> >> I agree it's not a significant amount of entropy, but even just one >> bit could mean a long stall if that put us over the edge of just not >> having enough for whatever is blocking on /dev/random. Mikael's boot >> did have a CDROM, it's not impossible that the handful of commands we >> end up doing to that device would have contributed enough entropy to >> get the boot done without stalling for minutes. >> >> One way to know for sure, and that's if Mikael tests the patch. > > I think I've got the root cause. I have one system in my test bed > exhibiting this behaviour. It turns out the disk in it has no > characteristics VPD page. The 0xB1 VPD was a SBC-3 addition, so that's > not surprising. However, the characteristics check bails before > setting the flags, so it takes the default flag which has flipped. > > We can either fix this by setting the QUEUE_FLAG_ADD_RANDOM if there's > no 0xB1 page or by setting the default as Jens proposed. I'd recommend just doing my patch, since that'll be the same behavior that SCSI had before. -- Jens Axboe