On Tue, 2 Jun 2020 at 01:45, Jens Axboe <axboe@xxxxxxxxx> wrote: > > On 6/1/20 5:37 PM, Damien Le Moal wrote: > > On Mon, 2020-06-01 at 14:53 +0200, Ulf Hansson wrote: > >> On Mon, 1 Jun 2020 at 13:58, Ming Lei <ming.lei@xxxxxxxxxx> wrote: > >>> On Mon, Jun 01, 2020 at 01:36:54PM +0200, Linus Walleij wrote: > >>>> On Mon, Jun 1, 2020 at 9:50 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > >>>>> On Thu, May 28, 2020 at 10:10:03AM +0200, Linus Walleij wrote: > >>>>>> The Kyber block scheduler is not suitable for single hardware > >>>>>> queue devices, so add a new flag for single hardware queue > >>>>>> devices and add that to the deadline and BFQ schedulers > >>>>>> so the Kyber scheduler will not be selected for single queue > >>>>>> devices. > >>>>> > >>>>> The above may not be true for some single hw queue high performance HBA( > >>>>> such as megasas), which can get better performance from none, so it is > >>>>> reasonable to get better performance from kyber, see 6ce3dd6eec11 ("blk-mq: > >>>>> issue directly if hw queue isn't busy in case of 'none'"), and the > >>>>> following link: > >>>>> > >>>>> https://lore.kernel.org/linux-block/20180710010331.27479-1-ming.lei@xxxxxxxxxx/ > >>>> > >>>> I see, but isn't the case rather that none is preferred and kyber gives > >>>> the same characteristics because it's not standing in the way > >>>> as much? > >>> > >>> Kyber has its own characteristic, such as fair read & write, better > >>> IO merge. And the decision on scheduler isn't only related with device, > >>> but also with workloads. > >>> > >>>> It looks like if we should add a special flag for these devices with > >>>> very fast single queues so they can say "I prefer none", do you > >>>> agree? > >>> > >>> I am not sure if it is easy to add such flag, because it isn't only > >>> related with HBA, but also with the attached disks. > >>> > >> > >> In general I don't mind the idea of giving hints from lower layer > >> block devices, about what kind of scheduling algorithm that could make > >> sense (as long it's on a reasonable granularity). > >> > >> If I understand your point correctly, what you are saying is that it > >> isn't easy or even possible for some block devices HWs. However, that > >> should be fine, as it wouldn't be mandatory to set this kind of flags, > >> but > >> instead could help where we see it fit, right? > > > > The elevator features flag was implemented not as a hint, but as hard > > requirements for elevators that are needed (mandatory) for a particular > > device type for correct operation. By correct operation, I mean "no IO > > errors or weird behavior resulting in errors such as timeouts". Until > > now, the only hard requirement we have is for zoned block devices which > > need mq-deadline to guarantee in-order dispatch of write commands (for > > sequential zones writing). > > > > We definitely could add hint flags to better help the block layer > > decide on the default optimal elevator for a particular device type, > > but as is, the elevator features will completely prevent the use of any > > other elevator that does not have the feature set. Those elevators will > > not be seen in /sys/block/<dev>/queue/scheduler. This may be a little > > too much for hint level rather than hard requirement. > > > > Furthermore, as Ming said, this depends on the HBA too rather than just > > the device itself. E.g. the smartpqi driver (Microsemi SAS HBAs) > > exposes single hard-disks as well as fast RAID arrays as multi-queue > > devices. While kyber may make sense for the latter, it probably does > > not make much sense for the former. > > > > In kernel vs udev rules for setting the optimal elevator for a > > particular device type should also be considered. > > Agree, the elevator flags are hard requirements, which doesn't match > what this patch is trying to do. There's absolutely nothing wrong with > using none or kyber on single queue devices, hence it should be possible > to configure it as such. I agree, the elevator flags as is, currently don't work for giving hints from lower block layers. However, I still think it would be worth exploring the idea that is brought up here. The point is, even if it's perfectly fine to use kyber for MMC/SD, for example, it would make little sense as BFQ performs better on this type of single queue storage device. So, why solely rely on userspace udev rules, when we can, in-kernel, help to decide what is the best configuration? Kind regards Uffe