On 2/12/19 8:24 AM, James Bottomley wrote: > On Mon, 2019-02-11 at 19:50 -0700, Jens Axboe wrote: >> On 2/11/19 7:13 PM, James Bottomley wrote: >>> On Mon, 2019-02-11 at 09:31 -0700, Jens Axboe wrote: >>>> On 2/11/19 9:28 AM, James Bottomley wrote: >>>>> On Mon, 2019-02-11 at 08:46 -0700, Jens Axboe wrote: >>>>>> On 2/11/19 8:42 AM, James Bottomley wrote: >>>>>>> On Mon, 2019-02-11 at 08:28 -0700, Jens Axboe wrote: >>>>>>>> On 2/11/19 8:25 AM, James Bottomley wrote: >>>>>>>>> On Sun, 2019-02-10 at 09:35 -0700, Jens Axboe wrote: >>>>>>>>>> On 2/10/19 9:25 AM, James Bottomley wrote: >>>>> >>>>> [...] >>>>>>>>>>> That check wasn't changed by the code removal. >>>>>>>>>> >>>>>>>>>> As I said above, for sd. This isn't true for non- >>>>>>>>>> disks. >>>>>>>>> >>>>>>>>> Yes, but the behaviour above doesn't change across a >>>>>>>>> switch >>>>>>>>> to MQ, so I don't quite understand how it bisects back >>>>>>>>> to >>>>>>>>> that change. If we're not gathering entropy for the >>>>>>>>> device >>>>>>>>> now, we wouldn't have been before the switch, so the >>>>>>>>> entropy characteristics shouldn't have changed. >>>>>>>> >>>>>>>> But it does, as I also wrote in that first email. The >>>>>>>> legacy >>>>>>>> queue flags had QUEUE_FLAG_ADD_RANDOM set by default, the >>>>>>>> MQ >>>>>>>> ones do not. Hence any non-sd device would previously >>>>>>>> ALWAYS >>>>>>>> have ADD_RANDOM set, now none of them do. Also see the >>>>>>>> patch >>>>>>>> I sent. >>>>>>> >>>>>>> So your theory is that the disk in question never gets to >>>>>>> the >>>>>>> rotational check? because the check will clear the flag if >>>>>>> it's non-rotational and set it if it's not, so the default >>>>>>> state of the flag shouldn't matter. >>>>>> >>>>>> No, my point is about non-disks, devices that aren't driven >>>>>> by >>>>>> sd. The behavior for sd hasn't changed, as it sets/clears it >>>>>> unconditionally. >>>>> >>>>> I agree, but I don't think any of them were significant entropy >>>>> contributors before: things like nvme have always been outside >>>>> of >>>>> this and sr and st don't really contribute much to the seek >>>>> load >>>>> during boot because they're probed but not used by the boot >>>>> sequence, so I can't see how they would cause this >>>>> behaviour. I >>>>> suppose it could be target probing, but even that seems >>>>> unlikely >>>>> because it should be dwarfed by the number of root disk reads >>>>> during boot. >>>>> >>>>> For the rng to take an additional 5 minutes to initialize, we >>>>> must >>>>> have lost a significant entropy source somewhere. >>>> >>>> I agree it's not a significant amount of entropy, but even just >>>> one >>>> bit could mean a long stall if that put us over the edge of just >>>> not >>>> having enough for whatever is blocking on /dev/random. Mikael's >>>> boot >>>> did have a CDROM, it's not impossible that the handful of >>>> commands we >>>> end up doing to that device would have contributed enough entropy >>>> to >>>> get the boot done without stalling for minutes. >>>> >>>> One way to know for sure, and that's if Mikael tests the patch. >>> >>> I think I've got the root cause. I have one system in my test bed >>> exhibiting this behaviour. It turns out the disk in it has no >>> characteristics VPD page. The 0xB1 VPD was a SBC-3 addition, so >>> that's >>> not surprising. However, the characteristics check bails before >>> setting the flags, so it takes the default flag which has flipped. >>> >>> We can either fix this by setting the QUEUE_FLAG_ADD_RANDOM if >>> there's >>> no 0xB1 page or by setting the default as Jens proposed. >> >> I'd recommend just doing my patch, since that'll be the same behavior >> that SCSI had before. > > I've got the history now, it's this patch > > Author: Xuewei Zhang <xueweiz@xxxxxxxxxx> > Date: Thu Sep 6 13:37:19 2018 -0700 > > scsi: sd: Contribute to randomness when running rotational device > > It added the else branch to the if (rot == 1). It's the position of > that else branch which is wrong because not all disks have a SBC-3 > characteristics VPD page, so they're the ones under MQ which stop > contributing entropy. Whichever patch we go with will need a fixes: > for this. Ah, makes sense. I'd say we're _probably_ fine just fixing that then, or at least it should be two separate patches. -- Jens Axboe