Currently a full set of static requests are allocated per hw queue per tagset when shared sbitmap is used. However, only tagset->queue_depth number of requests may be active at any given time. As such, only tagset->queue_depth number of static requests are required. The same goes for using an IO scheduler, which allocates a full set of static requests per hw queue per request queue. This series very significantly reduces memory usage in both scenarios by allocating static rqs per tagset and per request queue, respectively, rather than per hw queue per tagset and per request queue. For megaraid sas driver on my 128-CPU arm64 system with 1x SATA disk, we save approx. 300MB(!) [370MB -> 60MB] A couple of patches are marked as RFC, as maybe there is a better implementation approach. Any more testing would be appreciated also. John Garry (9): blk-mq: Change rqs check in blk_mq_free_rqs() block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests() blk-mq: Add blk_mq_tag_resize_sched_shared_sbitmap() blk-mq: Invert check in blk_mq_update_nr_requests() blk-mq: Refactor blk_mq_{alloc,free}_rqs blk-mq: Allocate per tag set static rqs for shared sbitmap blk-mq: Allocate per request queue static rqs for shared sbitmap blk-mq: Clear mappings for shared sbitmap sched static rqs block/blk-core.c | 2 +- block/blk-mq-sched.c | 57 ++++++++++++-- block/blk-mq-sched.h | 2 +- block/blk-mq-tag.c | 22 ++++-- block/blk-mq-tag.h | 1 + block/blk-mq.c | 165 +++++++++++++++++++++++++++++++---------- block/blk-mq.h | 9 +++ drivers/block/rbd.c | 2 +- include/linux/blk-mq.h | 4 + include/linux/blkdev.h | 6 +- 10 files changed, 215 insertions(+), 55 deletions(-) -- 2.26.2