Hi Here is V15 of the hardware command queue patches without the software command queue patches, now using blk-mq and now with blk-mq support for non-CQE I/O. V14 included a number of fixes to existing code, changes to default to blk-mq, and adds patches to remove legacy code. HW CMDQ offers 25% - 50% better random multi-threaded I/O. I see a slight 2% drop in sequential read speed but no change to sequential write. Non-CQE blk-mq showed a 3% decrease in sequential read performance. This seemed to be coming from the inferior latency of running work items compared with a dedicated thread. Hacking blk-mq workqueue to be unbound reduced the performance degradation from 3% to 1%. While we should look at changing blk-mq to give better workqueue performance, a bigger gain is likely to be made by adding a new host API to enable the next already-prepared request to be issued directly from within ->done() callback of the current request. Changes since V14: mmc: block: Fix missing blk_put_request() mmc: block: Check return value of blk_get_request() mmc: core: Do not leave the block driver in a suspended state mmc: block: Ensure that debugfs files are removed Dropped because they have been applied mmc: block: Use data timeout in card_busy_detect() Replaced by other patches mmc: block: Add blk-mq support Rename mmc_blk_ss_read() to mmc_blk_read_single() Add more error handling to single sector read Let mmc_blk_mq_complete_rq() cater for requests already "updated" by recovery Rename mmc_blk_mq_acct_req_done() to mmc_blk_mq_dec_in_flight() Add comments about synchronization Add comment about not dispatching in parallel Add comment about the queue depth mmc: block: Add CQE support Add coment about CQE queue depth mmc: block: blk-mq: Add support for direct completion Rename mmc_queue_direct_complete() to mmc_host_done_complete() Rename MMC_CAP_DIRECT_COMPLETE to MMC_CAP_DONE_COMPLETE mmc: block: blk-mq: Separate card polling from recovery Ensure to report gen_err as an error mmc: block: Make card_busy_detect() accumulate all response error bits Patch moved later in the patch set and adjusted accordingly mmc: block: blk-mq: Check error bits and save the exception bit when polling card busy Adjusted due to patch re-ordering mmc: block: Check the timeout correctly in card_busy_detect() New patch. mmc: block: Add timeout_clks when calculating timeout New patch. mmc: block: Reduce polling timeout from 10 minutes to 10 seconds New patch. Changes since V13: mmc: block: Fix missing blk_put_request() New patch. mmc: block: Check return value of blk_get_request() New patch. mmc: core: Do not leave the block driver in a suspended state New patch. mmc: block: Ensure that debugfs files are removed New patch. mmc: block: No need to export mmc_cleanup_queue() New patch. mmc: block: Simplify cleaning up the queue New patch. mmc: block: Use data timeout in card_busy_detect() New patch. mmc: block: Check for transfer state in card_busy_detect() New patch. mmc: block: Make card_busy_detect() accumulate all response error bits New patch. mmc: core: Make mmc_pre_req() and mmc_post_req() available New patch. mmc: core: Add parameter use_blk_mq Default to y mmc: block: Add blk-mq support Wrap blk_mq_end_request / blk_end_request_all Rename mmc_blk_rw_recovery -> mmc_blk_mq_rw_recovery Additional parentheses to '==' expressions Use mmc_pre_req() / mmc_post_req() Fix missing tuning release on error after mmc_start_request() Expand comment about timeouts Allow for possibility that the queue is quiesced when removing Ensure complete_work is flushed when removing mmc: block: Add CQE support Additional parentheses to '==' expressions mmc: block: blk-mq: Check error bits and save the exception bit when polling card busy Replaces patch "Stop using card_busy_detect()" retaining card_busy_detect() mmc: block: blk-mq: Stop using legacy recovery Allow for SPI mmc: mmc_test: Do not use mmc_start_areq() anymore New patch. mmc: core: Remove option not to use blk-mq New patch. mmc: block: Remove code no longer needed after the switch to blk-mq New patch. mmc: core: Remove code no longer needed after the switch to blk-mq New patch. Changes since V12: mmc: block: Add error-handling comments New patch. mmc: block: Add blk-mq support Use legacy error handling mmc: block: Add CQE support Re-base mmc: block: blk-mq: Add support for direct completion New patch. mmc: block: blk-mq: Separate card polling from recovery New patch. mmc: block: blk-mq: Stop using card_busy_detect() New patch. mmc: block: blk-mq: Stop using legacy recovery New patch. Changes since V11: Split "mmc: block: Add CQE and blk-mq support" into 2 patches Changes since V10: mmc: core: Remove unnecessary host claim mmc: core: Introduce host claiming by context mmc: core: Add support for handling CQE requests mmc: mmc: Enable Command Queuing mmc: mmc: Enable CQE's mmc: block: Use local variables in mmc_blk_data_prep() mmc: block: Prepare CQE data mmc: block: Factor out mmc_setup_queue() mmc: core: Add parameter use_blk_mq mmc: core: Export mmc_start_bkops() mmc: core: Export mmc_start_request() mmc: core: Export mmc_retune_hold_now() and mmc_retune_release() Dropped because they have been applied mmc: block: Add CQE and blk-mq support Extend blk-mq support for asynchronous read / writes to all host controllers including those that require polling. The direct completion path is still available but depends on a new capability flag. Drop blk-mq support for synchronous read / writes. Venkat Gopalakrishnan (1): mmc: cqhci: support for command queue enabled host Changes since V9: mmc: block: Add CQE and blk-mq support - reinstate mq support for REQ_OP_DRV_IN/OUT that was removed because it was incorrectly assumed to be handled by the rpmb character device - don't check for rpmb block device anymore mmc: cqhci: support for command queue enabled host Fix cqhci_set_irqs() as per Haibo Chen Changes since V8: Re-based mmc: core: Introduce host claiming by context Slightly simplified as per Ulf mmc: core: Export mmc_retune_hold_now() and mmc_retune_release() New patch. mmc: block: Add CQE and blk-mq support Fix missing ->post_req() on the error path Changes since V7: Re-based mmc: core: Introduce host claiming by context Slightly simplified mmc: core: Add parameter use_blk_mq New patch. mmc: core: Remove unnecessary host claim New patch. mmc: core: Export mmc_start_bkops() New patch. mmc: core: Export mmc_start_request() New patch. mmc: block: Add CQE and blk-mq support Add blk-mq support for non_CQE requests Changes since V6: mmc: core: Introduce host claiming by context New patch. mmc: core: Move mmc_start_areq() declaration Dropped because it has been applied mmc: block: Fix block status codes Dropped because it has been applied mmc: host: Add CQE interface Dropped because it has been applied mmc: core: Turn off CQE before sending commands Dropped because it has been applied mmc: block: Factor out mmc_setup_queue() New patch. mmc: block: Add CQE support Drop legacy support and add blk-mq support Changes since V5: Re-based mmc: core: Add mmc_retune_hold_now() Dropped because it has been applied mmc: core: Add members to mmc_request and mmc_data for CQE's Dropped because it has been applied mmc: core: Move mmc_start_areq() declaration New patch at Ulf's request mmc: block: Fix block status codes Another un-related patch mmc: host: Add CQE interface Move recovery_notifier() callback to struct mmc_request mmc: core: Add support for handling CQE requests Roll __mmc_cqe_request_done() into mmc_cqe_request_done() Move function declarations requested by Ulf mmc: core: Remove unused MMC_CAP2_PACKED_CMD Dropped because it has been applied mmc: block: Add CQE support Add explanation to commit message Adjustment for changed recovery_notifier() callback mmc: cqhci: support for command queue enabled host Adjustment for changed recovery_notifier() callback mmc: sdhci-pci: Add CQHCI support for Intel GLK Add DCMD capability for Intel controllers except GLK Changes since V4: mmc: core: Add mmc_retune_hold_now() Add explanation to commit message. mmc: host: Add CQE interface Add comments to callback declarations. mmc: core: Turn off CQE before sending commands Add explanation to commit message. mmc: core: Add support for handling CQE requests Add comments as requested by Ulf. mmc: core: Remove unused MMC_CAP2_PACKED_CMD New patch. mmc: mmc: Enable Command Queuing Adjust for removal of MMC_CAP2_PACKED_CMD. Add a comment about Packed Commands. mmc: mmc: Enable CQE's Remove un-necessary check for MMC_CAP2_CQE mmc: block: Use local variables in mmc_blk_data_prep() New patch. mmc: block: Prepare CQE data Adjust due to "mmc: block: Use local variables in mmc_blk_data_prep()" Remove priority setting. Add explanation to commit message. mmc: cqhci: support for command queue enabled host Fix transfer descriptor setting in cqhci_set_tran_desc() for 32-bit DMA Changes since V3: Adjusted ...blk_end_request...() for new block status codes Fixed CQHCI transaction descriptor for "no DCMD" case Changes since V2: Dropped patches that have been applied. Re-based Added "mmc: sdhci-pci: Add CQHCI support for Intel GLK" Changes since V1: "Share mmc request array between partitions" is dependent on changes in "Introduce queue semantics", so added that and block fixes: Added "Fix is_waiting_last_req set incorrectly" Added "Fix cmd error reset failure path" Added "Use local var for mqrq_cur" Added "Introduce queue semantics" Changes since RFC: Re-based on next. Added comment about command queue priority. Added some acks and reviews. Adrian Hunter (21): mmc: block: No need to export mmc_cleanup_queue() mmc: block: Simplify cleaning up the queue mmc: core: Make mmc_pre_req() and mmc_post_req() available mmc: block: Add error-handling comments mmc: core: Add parameter use_blk_mq mmc: block: Add blk-mq support mmc: block: Add CQE support mmc: sdhci-pci: Add CQHCI support for Intel GLK mmc: block: blk-mq: Add support for direct completion mmc: block: blk-mq: Separate card polling from recovery mmc: block: Make card_busy_detect() accumulate all response error bits mmc: block: blk-mq: Check error bits and save the exception bit when polling card busy mmc: block: Check the timeout correctly in card_busy_detect() mmc: block: Check for transfer state in card_busy_detect() mmc: block: Add timeout_clks when calculating timeout mmc: block: Reduce polling timeout from 10 minutes to 10 seconds mmc: block: blk-mq: Stop using legacy recovery mmc: mmc_test: Do not use mmc_start_areq() anymore mmc: core: Remove option not to use blk-mq mmc: block: Remove code no longer needed after the switch to blk-mq mmc: core: Remove code no longer needed after the switch to blk-mq Venkat Gopalakrishnan (1): mmc: cqhci: support for command queue enabled host drivers/mmc/core/block.c | 1383 +++++++++++++++++++++---------------- drivers/mmc/core/block.h | 12 +- drivers/mmc/core/bus.c | 2 - drivers/mmc/core/core.c | 216 +----- drivers/mmc/core/core.h | 39 +- drivers/mmc/core/host.h | 6 +- drivers/mmc/core/mmc_test.c | 122 ++-- drivers/mmc/core/queue.c | 504 +++++++++----- drivers/mmc/core/queue.h | 64 +- drivers/mmc/host/Kconfig | 14 + drivers/mmc/host/Makefile | 1 + drivers/mmc/host/cqhci.c | 1150 ++++++++++++++++++++++++++++++ drivers/mmc/host/cqhci.h | 240 +++++++ drivers/mmc/host/sdhci-pci-core.c | 155 ++++- include/linux/mmc/host.h | 5 +- 15 files changed, 2835 insertions(+), 1078 deletions(-) create mode 100644 drivers/mmc/host/cqhci.c create mode 100644 drivers/mmc/host/cqhci.h Regards Adrian