RE: [PATCH] blk-mq: set default elevator as deadline in case of hctx shared tagset

Kashyap Desai <kashyap.desai@xxxxxxxxxxxx> · Wed, 14 Apr 2021 13:51:01 +0530

>
> On Wed, Apr 07, 2021 at 09:04:30AM +0100, John Garry wrote:
> > Reviewed-by: John Garry <john.garry@xxxxxxxxxx>
> >
> >
> > > On Tue, Apr 06, 2021 at 11:25:08PM +0100, John Garry wrote:
> > > > On 06/04/2021 04:19, Ming Lei wrote:
> > > >
> > > > Hi Ming,
> > > >
> > > > > Yanhui found that write performance is degraded a lot after
> > > > > applying hctx shared tagset on one test machine with
> > > > > megaraid_sas. And turns out it is caused by none scheduler which
> > > > > becomes default elevator caused by hctx shared tagset patchset.
> > > > >
> > > > > Given more scsi HBAs will apply hctx shared tagset, and the
> > > > > similar performance exists for them too.
> > > > >
> > > > > So keep previous behavior by still using default mq-deadline for
> > > > > queues which apply hctx shared tagset, just like before.
> > > > I think that there a some SCSI HBAs which have nr_hw_queues > 1
> > > > and don't use shared sbitmap - do you think that they want want
> > > > this as well (without knowing it)?

John - I have noted this and discussing internally.
This patch fixing shared host tag behavior is good (and required to intact
earlier behavior) but for <mpi3mr> which is true multi hardware queue
interface, I will update later.
In general most of the OS vendor recommend <mq-deadline> for rotational
media and <none> for non-rotational media. We would like to go with this
method in <mpi3mr> driver.

> > > I don't know but none has been used for them since the beginning, so
> > > not an regression of shared tagset, but this one is really.
> >
> > It seems fine to revert to previous behavior when host_tagset is set.
> > I didn't check the results for this recently, but for the original
> > shared tagset patchset [0] I had:
> >
> > none sched:		2132K IOPS
> > mq-deadline sched:	2145K IOPS

On my local setup also I did not see much difference.

>
> BTW, Yanhui reported that sequential write on virtio-scsi drops by
40~70% in
> VM, and the virito-scsi is backed by file image on XFS over
megaraid_sas. And
> the disk is actually SSD, instead of HDD. It could be worse in case of
> megaraid_sas HDD.

Ming -  If we have old megaraid_sas driver (without host tag set patch),
and just toggling io-scheduler from <none> to <mq-deadline> (through
sysfs) also gives similar performance drop.  ?

I think performance drop using <none> io scheduler, might be due to bio
merge is missing compare to mq-deadline. It may not be linked to shared
host tag IO path.
Usually bio merge does not help for sequential work load if back-end is
enterprise SSDs/NVME, but it is not always true. It is difficult to have
all setup and workload to get benefit from one io-scheduler.

I may like to reproduce similar drop locally.   I will check with you and
Yanhui about how to reproduce similar drop (for my future reference and
want to have similar test in my performance BST).

Kashyap

>
> Same drop is observed on virtio-blk too.
>
> I didn't figure out one simple reproducer in host side yet, but the
performance
> data is pretty stable in the VM IO workload.
>
>
> Thanks,
> Ming
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature