Re: [PATCH v2] RFD: switch MMC/SD to use blk-mq multiqueueing

Arnd Bergmann <arnd@xxxxxxxx> · Mon, 02 Jan 2017 10:40:15 +0100



On Thursday, December 29, 2016 12:59:51 AM CET Linus Walleij wrote:
> On Wed, Dec 28, 2016 at 9:55 AM, Christoph Hellwig <hch@xxxxxx> wrote:
> > On Tue, Dec 27, 2016 at 01:21:28PM +0100, Linus Walleij wrote:
> 
> >> On the contrary we expect a performance regression as mq has no
> >> scheduling. MQ is created for the usecase where you have multiple
> >> hardware queues and they are so hungry for work that you have a problem
> >> feeding them all. Needless to say, on eMMC/SD we don't have that problem
> >> right now atleast.
> >
> > That's not entirely correct.  blk-mq is designed to replace the legacy
> > request code eventually.  The focus is on not wasting CPU cycles, and
> > to support multiple queues (but not require them).
> 
> OK! Performance is paramount, so this indeed confirms that we need
> to re-engineer the MMC/SD stack to not rely on this kthread to "drive"
> transactions, instead we need to complete them quickly from the driver
> callbacks and let MQ drive.
> 
> A problem here is that issueing the requests are in blocking context
> while completion is in IRQ context (for most drivers) so we need to
> look into this.

I think whether issuing an mmc request requires blocking should ideally
be up to the host device driver, at least I'd hope that we can end up
with something like this driving latency to the absolute minimum:

a) with MMC CMDQ support:
  - report actual queue depth
  - have blk-mq push requests directly into the device through
    the mmc-block driver and the host driver
  - if a host driver needs to block, make it use a private workqueue

b) without MMC CMDQ support:
  - report queue depth of '2'
  - first request gets handled as above
  - if one request is pending, prepare the second request and
    add a pointer to the mmc host structure (not that different
    from what we do today)
  - when the host driver completes a request, have it immediately
    issue the next one from the interrupt handler. In case we need
    to sleep here, use a threaded IRQ, or a workqueue. This should
    avoid the need for the NULL requests

c) possible optimization for command packing without CMDQ:
   - similar to b)
   - report a longer queue (e.g. 8, maybe user selectable, to
     balance throughput against latency)
   - any request of the same type (read or write) as the one that
     is currently added to the host as the 'next' one can be
     added to that request so they get issued together
   - if the types are different, report the queue to be busy

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html