On Mon, Nov 18, 2019 at 11:43 AM Baolin Wang <baolin.wang7@xxxxxxxxx> wrote: > > From: Baolin Wang <baolin.wang@xxxxxxxxxx> > > Now the MMC read/write stack will always wait for previous request is > completed by mmc_blk_rw_wait(), before sending a new request to hardware, > or queue a work to complete request, that will bring context switching > overhead, especially for high I/O per second rates, to affect the IO > performance. > > Thus this patch introduces MMC software queue interface based on the > hardware command queue engine's interfaces, which is similar with the > hardware command queue engine's idea, that can remove the context > switching. Moreover we set the default queue depth as 32 for software > queue, which allows more requests to be prepared, merged and inserted > into IO scheduler to improve performance, but we only allow 2 requests > in flight, that is enough to let the irq handler always trigger the > next request without a context switch, as well as avoiding a long latency. > > From the fio testing data in cover letter, we can see the software > queue can improve some performance with 4K block size, increasing > about 16% for random read, increasing about 90% for random write, > though no obvious improvement for sequential read and write. > > Moreover we can expand the software queue interface to support MMC > packed request or packed command in future. > > Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxx> > Signed-off-by: Baolin Wang <baolin.wang7@xxxxxxxxx> Overall, this looks like enough of a win that I think we should just use the current version for the moment, while still working on all the other improvements. My biggest concern is the naming of "software queue", which is a concept that runs against the idea of doing all the heavy lifting, in particular the queueing in bfq. Then again, it does not /actually/ do much queuing at all, beyond preparing a single request so it can fire it off early. Even with the packed command support added in, there is not really any queuing beyond what it has to do anyway. Using the infrastructure that was added for cqe seems like a good compromise, as this already has a way to hand down multiple requests to the hardware and is overall more modern than the existing support. I still think we should do all the other things I mentioned in my earlier reply today, but they can be done as add-ons: - remove all blocking calls from the queue_rq() function: partition-change, retune, etc should become non-blocking operations that return busy in the queue_rq function. - get bfq to send down multiple requests all the way into the device driver, so we don't have to actually queue them here at all to do packed commands - add packed command support - submit cmds from hardirq context if this is advantageous, and move everything else in the irq handler into irqthread context in order to remove all other workqueue and softirq processing from the request processing path. If we can agree on this as the rough plan for the future, feel free to add my Reviewed-by: Arnd Bergmann <arnd@xxxxxxxx>