Re: [PATCH v2 10/11] megaraid_sas: Use Block layer API to check SCSI device in-flight IO requests

Sumit Saxena <sumit.saxena@xxxxxxxxxxxx> · Tue, 3 Mar 2020 00:07:44 +0530



On Mon, Mar 2, 2020 at 3:21 PM John Garry <john.garry@xxxxxxxxxx> wrote:
>
>
> >> static inline void
> >> megasas_get_msix_index(struct megasas_instance *instance,
> >>                 struct scsi_cmnd *scmd,
> >>                 struct megasas_cmd_fusion *cmd,
> >>                 u8 data_arms)
> >> {
> >> ...
> >>
> >> sdev_busy = atomic_read(&hctx->nr_active);
> >>
> >> if (instance->perf_mode == MR_BALANCED_PERF_MODE &&
> >>      sdev_busy > (data_arms * MR_DEVICE_HIGH_IOPS_DEPTH))
> >>      cmd->request_desc->SCSIIO.MSIxIndex =
> >>              mega_mod64(...),
> >>      else if (instance->msix_load_balance)
> >>          cmd->request_desc->SCSIIO.MSIxIndex =
> >>              (mega_mod64(...),
> >>                  instance->msix_vectors));
> >>
> >> Will this make a difference? I am not sure. Maybe, on this basis,
> >> magaraid sas is not a good candidate to change to expose multiple queues.
> >>
> >> Ignoring that for a moment, since we no longer keep a host busy count,
> >> and I figure that we don't want to back to using
> >> scsi_device.device_busy, is the judgement of hctx->nr_active ok to use
> >> to decide whether to use these performance queues?
> >>
> > Personally, I wonder if the current implementation of high-IOPs queues
> > makes sense with multiqueue. > Thing is, the current high-IOPs queue mechanism of shifting I/O to
> > another internal queue doesn't align nicely with the blk-mq architecture.
>
> Right, we should not be hiding HW queues from blk-mq like this. This
> breaks the symmetry. Maybe we can move this functionality to blk-mq, but
> I doubt that this is a common use case.
We added this concept of extra queues for megraid_sas latest gen of controllers
for performance reasons. Here is some background-
https://lore.kernel.org/lkml/20180829084618.GA24765@ming.t460p/t/
We worked with the community to have such interface for managed(for
low latency queues) and non-managed(High IOPs queues)
interrupts co-existence.

>
> > What we _do_ have, though, is a 'poll' queue mechanism, allowing to
> > separate out one (or several) queues for polling, to allow for ..
> > indeed, high-IOPs.
>
> Any examples or references for this?
>
> > So it would be interesting to figure out if we don't get similar
> > performance by using the 'poll' queue implementation from blk-mq instead
> > of the current one.
>
> I thought that this driver/or mpt3sas already used a polling mode.
>
> And for these low-latency queues, I figure that the issue is not just
> polling vs interrupt, but indeed how fast the HW queue can consume SQEs.
> A HW queue may only be able to consume at a limited rate - that's why we
> segregate.
Yes, there is no polling in any of HW queues. High IOPs queues have
interrupt coalescing enabled whereas
low latency queues does not have interrupt coalescing. megaraid_sas
driver would choose which set of queues
among these two has to be used depending on workload. For latency
oriented workload, driver would use low
latency queues and for IOPs profile, driver would use High IOPs queues.
>
> As an aside, that is actually an issue for blk-mq. For 1 to many HW
> queue-to-CPU mapping, limiting many CPUs a single queue can limit IOPs
> since HW queues can only consume at a limited rate.
We were able to achieve performance target for MegaRAID latest gen
controller with this model of few set
 of HW queues mapped to local numa CPUs and low latency queues has one
to one mapping to CPUs.
This is default behavior of queues segregation in megaraid_sas driver
to satisfy our IOPs and latency requirements altogether.
However we have given module parameter- "perf_mode" to tune queues
behavior. i.e turning on/off interrupt
coalescing on all HW queues where this one to many queues to CPU
mapping would not happen.

Thanks,
Sumit
>
> >
> > Which would also have the benefit that we could support the io_uring
> > interface natively with megaraid_sas, which I think would be a benefit
> > on its own.
> >
>
> thanks,
> John
>