Re: [PATCH v2] RFD: switch MMC/SD to use blk-mq multiqueueing

Paolo Valente <paolo.valente@xxxxxxxxxx> · Tue, 3 Jan 2017 08:50:11 +0100

> Il giorno 02 gen 2017, alle ore 18:08, Arnd Bergmann <arnd@xxxxxxxx> ha scritto:
> 
> On Monday, January 2, 2017 12:06:04 PM CET Hannes Reinecke wrote:
>> On 01/02/2017 10:40 AM, Arnd Bergmann wrote:
>>> b) without MMC CMDQ support:
>>>  - report queue depth of '2'
>>>  - first request gets handled as above
>>>  - if one request is pending, prepare the second request and
>>>    add a pointer to the mmc host structure (not that different
>>>    from what we do today)
>>>  - when the host driver completes a request, have it immediately
>>>    issue the next one from the interrupt handler. In case we need
>>>    to sleep here, use a threaded IRQ, or a workqueue. This should
>>>    avoid the need for the NULL requests
>>> 
>>> c) possible optimization for command packing without CMDQ:
>>>   - similar to b)
>>>   - report a longer queue (e.g. 8, maybe user selectable, to
>>>     balance throughput against latency)
>>>   - any request of the same type (read or write) as the one that
>>>     is currently added to the host as the 'next' one can be
>>>     added to that request so they get issued together
>>>   - if the types are different, report the queue to be busy
>>> 
>> Hmm. But that would amount to implement yet another queuing mechanism
>> within the driver/mmc subsystem, wouldn't it?
>> 
>> Which is, incidentally, the same method the S/390 DASD driver uses
>> nowadays; report an arbitrary queue depth to the block layer and queue
>> all requests internally to better saturate the device.
>> 
>> However I'd really like to get rid of this, and tweak the block layer to
>> handle these cases.
>> 
>> One idea I had was to use a 'prep' function for mq; it would be executed
>> once the request is added to the queue.
>> Key point here is that 'prep' and 'queue_rq' would be two distinct
>> steps; the driver could do all required setup functionality during
>> 'prep', and then do the actual submission via 'queue_rq'.
> 
> Right, I had the same idea and I think even talked to you about that.
> I think this would address case 'b)' above perfectly, but I don't
> see how we can fit case 'c)' in that scheme.
> 
> Maybe if we could teach blk_mq that a device might be able to not
> just merge consecutive requests but also a limited set of
> non-consecutive requests in a way that MMC needs them? Unfortunately,
> the 'packed command' is rather MMC specific, and it might even
> be worthwhile to do some optimization regarding what commands
> get packed (e.g. only writes, only small requests, or only up to
> a total size).
> 
> While there are some parallels to DASD, but it's not completely
> the same, even if we ignore the command packing.
> 
> - MMC wants the 'prepare' stage to reduce the latency from
>  dma_map_sg() calls. This is actually more related to the
>  CPU/SoC architecture and really important on most ARM systems
>  as they are not cache coherent, but it's not just specific
>  to MMC other than MMC performance on low-end ARM being
>  really important to a lot of people since it's used in all
>  Android phones.
>  If we could find a way to get blk_mq to call blk_rq_map_sg()
>  for us, we wouldn't need a ->prepare() step or a fake queue
>  for case 'b)', and we could make use of that in other drivers
>  too.
> 
> - DASD in contrast wants to build the channel programs while
>  other I/O is running, and this is very specific to that
>  driver. There are probably some other things it does in its
>  own queue implementation that are also driver specific.
> 

Sorry for the noise, but, assuming that an ignorant point of view
might have some value, exactly because it ignores details, here is my
point of view.  IMO the cleanest design would be the one where blk-mq
does only the job it has been designed for, i.e., pushes requests into
queues, and the driver takes care of the idiosyncrasies of the
device.  Concretely, the driver could
1) advertise a long queue (e.g., 32) to be constantly fed with a
large window of requests;
2) not pass requests immediately to the device, but keep them as long
as needed, before finally handing them to the device.

The driver could then use the window of requests, internally queued, to
perform exactly the operations that it now performs with the
collaboration of blk, such as command packing.  blk-mq would be
unchanged.  If I'm not mistaken, this would match, at least in part,
what some of you already proposed in more detail.

I apologize if I'm talking complete nonsense.

Paolo

>> That would allow to better distribute the load for timing-sensitive
>> devices, and with a bit of luck remove the need for a separate queuing
>> inside the driver.
>> 
>> In any case, it looks like a proper subject for LSF/MM...
>> Linus? Arnd? Are you up for it?
> 
> I'm probably not going to work on it myself, but I think it would
> be great for Linus and/or Ulf to bring this up at LSF/MM.
> 
> 	Arnd

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html