[CC the block layer maintainers so they can smack my fingers if I misunderstood any of how the block layer semantics work...] On Mon, Nov 21, 2016 at 3:17 PM, Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote: > On 21/11/16 16:02, Ulf Hansson wrote: >> I also believe, the implementation would become really complex, as you >> would need to "hold" the first write request, while waiting for a >> second to arrive. Then for how long shall you hold it? And then what >> if you are unlucky so the next is read request, thus you can't pack >> them. The solution will be suboptimal, won't it? > > It doesn't hold and wait now. So why would it in the blk-mq case? The current kthread in drivers/mmc/card/queue.c looks like this in essence: struct request *r; while (1) r = blk_fetch_request(q); issue(); } It is pulling out as much as it can to asynchronously issue two requests in parallel and also the packed command (as mentioned in the commitlog to $SUBJECT) pulls out even more stuff of the queue to speculatively issue things in a packed command. The block layer isn't supposed to be used like this. It is supposed to be used like so: 1. You get notified by the request_fn that is passed with blk_init_queue() 2. The request function fires a work. 3. The work pick ONE request with blk_fetch_request() and handles it. 4. Repeat from (1) Instead of doing this the MMC layer kthread is speculatively pulling out stuff of the queue whenever it can, including pulling out a few NULL at the end before it stops. The mechanism is similar to a person running along a queue and picking a segment of passengers into a bus to send off, batching them. Which is clever, but the block layer is not supposed to be used like that. It just happens to work. In blk-mq this speculative fetching is no longer possible. Instead you register a notification function in struct blk_mq_ops vtable .queue_rq() callback: this will be called by the block layer core whenever there is a request available on "our" queue. It further approves a .init_request() callback that is called overhead to allocate a per-request context, such as the current struct mmc_queue_req - just not quite because the current mmc_queue_req is not mapped 1-to-1 onto a request from the block layer because of packed command; but it is after this patch, hehe ;) Any speculative batching needs to happen *after* this, i.e. the MMC layer would have to report a certain larger queue depth (if you set it to 1 you only ever get one request at the time and have to finish it before you get a new one), group the requests itself with packed command or command queueing, then signal them back as they are confirmed completed by the device, or, if they cannot be grouped, handle as far as you can and put the remaining requests back on the queue (creating a "bubble" in the pipeline). Relying on iterating and inspecting the block layer queue is *not* possible with blk-mq, sure blk_fetch_request() is still there, but if you call it on an mq-registered queue, it will crash the kernel. (At least it did for me.) Clearly it is not intended to be used with MQ: none of the MQ-converted subsystems use this. (drivers/mtd/ubi/block.c is a good simple example) I liken these mechanisms to a pipeline: - The two-levels deep speculation buffer in struct mmc_queue field .mqrq[2] is a "software pipeline" in the MMC layer (so we can prepare and handle requests in parallel) - The packed command and command queue is a hardware-supported pipeline on the device side. Both try to overcome the hardware limitations of the MMC/SD logical interface. This batching-style pipelining isn't really solving the problem the way real multiqueue hardware does, so it is a poor man's patchwork to the problem. In either case, as Ulf notes, you need to get a few requests off the queue and group them using packed command or command queueing if possible, but that grouping needs to happen in the MMC/SD layer, after picking the requests from the queue. I think it is OK to do so and just put any requests you cannot pack into the pipeline back on the queue. But I am not sure (still learning....) Yours, Linus Walleij -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html