Hi, The 1st patch removes memory footprint of percpu_ref in fast path from 7 words to 2 words, since it is often used in fast path and embedded in user struct. The 2nd patch moves .q_usage_counter to 1st cacheline of 'request_queue'. Simple test on null_blk shows ~2% IOPS boost on one 16cores(two threads per core) machine, dual socket/numa. V4: - rename percpu_ref_inited as percpu_ref_is_initialized V3: - fix kernel oops on MD - add patch for avoiding to use percpu-refcount internal from md code - pass Red Hat CKI test which is done by Veronika Kabatova V2: - pass 'gfp' to kzalloc() for fixing block/027 failure reported by kernel test robot - protect percpu_ref_is_zero() with destroying percpu-refcount by spin lock Ming Lei (3): percpu_ref: add percpu_ref_is_initialized for MD percpu_ref: reduce memory footprint of percpu_ref in fast path block: move 'q_usage_counter' into front of 'request_queue' drivers/infiniband/sw/rdmavt/mr.c | 2 +- drivers/md/md.c | 2 +- include/linux/blkdev.h | 3 +- include/linux/percpu-refcount.h | 46 ++++------ lib/percpu-refcount.c | 137 +++++++++++++++++++++++------- 5 files changed, 126 insertions(+), 64 deletions(-) Cc: Veronika Kabatova <vkabatov@xxxxxxxxxx> Cc: Sagi Grimberg <sagi@xxxxxxxxxxx> Cc: Tejun Heo <tj@xxxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxx> Cc: Jens Axboe <axboe@xxxxxxxxx> Cc: Bart Van Assche <bvanassche@xxxxxxx> -- 2.25.2