On Mon, Nov 11, 2019 at 1:58 PM Baolin Wang <baolin.wang@xxxxxxxxxx> wrote: > On Mon, 11 Nov 2019 at 17:28, Arnd Bergmann <arnd@xxxxxxxx> wrote: > > On Mon, Nov 11, 2019 at 8:35 AM Baolin Wang <baolin.wang@xxxxxxxxxx> wrote: > > - Removing all the context switches and workqueues from the data submission > > path is also the right idea. As you found, there is still a workqueue inside > > of blk_mq that is used because it may get called from atomic context but > > the submission may get blocked in __mmc_claim_host(). This really > > needs to be changed as well, but not in the way I originally suggested: > > As Hannes suggested, the host interrrupt handler should always use > > request_threaded_irq() to have its own process context, and then pass a > > flag to blk_mq to say that we never need another workqueue there. > > So you mean we should complete the request in the host driver irq > thread context, then issue another request in this context by calling > blk_mq_run_hw_queues()? Yes. I assumed there was already code that would always run blk_mq_run_hw_queue() at I/O completion, but I can't find where that happens today. As I understand, the main difference to today is that __blk_mq_delay_run_hw_queue() can call into __blk_mq_run_hw_queue directly rather than using the delayed work queue once we can skip the BLK_MQ_F_BLOCKING check. > > - With that change in place calling a blocking __mmc_claim_host() is > > still a problem, so there should still be a nonblocking mmc_try_claim_host() > > for the submission path, leading to a BLK_STS_DEV_RESOURCE (?) > > return code from mmc_mq_queue_rq(). Basically mmc_mq_queue_rq() > > should always return right away, either after having queued the next I/O > > or with an error, but not waiting for the device in any way. > > Actually not only the mmc_claim_host() will block the MMC request > processing, in this routine, the mmc_blk_part_switch() and > mmc_retune() can also block the request processing. Moreover the part > switching and tuning should be sync operations, and we can not move > them to a work or a thread. Ok, I see. Those would also cause requests to be sent to the device or the host controller, right? Maybe we can treat them as "a non-IO request has successfully been queued to the device" events, returning busy from the mmc_mq_queue_rq() function and then running the queue again when they complete? > > - For the packed requests, there is apparently a very simple way to implement > > that without a software queue: mmc_mq_queue_rq() is allowed to look at > > and dequeue all requests that are currently part of the request_queue, > > so it should take out as many as it wants to submit at once and send > > them all down to the driver together, avoiding the need for any further > > round-trips to blk_mq or maintaining a queue in mmc. > > You mean we can dispatch a request directly from > elevator->type->ops.dispatch_request()? but we still need some helper > functions to check if these requests can be packed (the package > condition), and need to invent new APIs to start a packed request (or > using cqe interfaces, which means we still need to implement some cqe > callbacks). I don't know how the dispatch_request() function fits in there, what Hannes told me is that in ->queue_rq() you can always look at the following requests that are already queued up and take the next ones off the list. Looking at bd->last tells you if there are additional requests. If there are, you can look at the next one from blk_mq_hw_ctx (not sure how, but should not be hard to find) I also see that there is a commit_rqs() callback that may go along with queue_rq(), implementing that one could make this easier as well. > > - The DMA management (bounce buffer, map, unmap) that is currently > > done in mmc_blk_mq_issue_rq() should ideally be done in the > > init_request()/exit_request() (?) callbacks from mmc_mq_ops so this > > can be done asynchronously, out of the critical timing path for the > > submission. With this, there won't be any need for a software queue. > > This is not true, now the blk-mq will allocate some static request > objects (usually the static requests number should be the same with > the hardware queue depth) saved in struct blk_mq_tags. So the > init_request() is used to initialize the static requests when > allocating them, and call exit_request to free the static requests > when freeing the 'struct blk_mq_tags', such as the queue is dead. So > we can not move the DMA management into the init_request/exit_request. Ok, I must have misremembered which callback that is then, but I guess there is some other place to do it. Arnd