From: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> v2: - Change to use call_single_data_t, which use __aligned() to avoid to use 2 cache lines for 1 csd. Thanks Ming Lei. - [v1] https://lore.kernel.org/all/20230627120854.971475-1-chengming.zhou@xxxxxxxxx/ Hello, After the commit be4c427809b0 ("blk-mq: use the I/O scheduler for writes from the flush state machine"), rq->flush can't reuse rq->elv anymore, since flush_data requests can go into io scheduler now. That increased the size of struct request by 24 bytes, but this patchset can decrease the size by 40 bytes, which is good I think. patch 1 use percpu csd to do remote complete instead of per-rq csd, decrease the size by 24 bytes. patch 2-3 reuse rq->queuelist in flush state machine pending list, and maintain a u64 counter of inflight flush_data requests, decrease the size by 16 bytes. patch 4 is just cleanup by the way. Thanks for comments! Chengming Zhou (4): blk-mq: use percpu csd to remote complete instead of per-rq csd blk-flush: count inflight flush_data requests blk-flush: reuse rq queuelist in flush state machine blk-mq: delete unused completion_data in struct request block/blk-flush.c | 19 +++++++++---------- block/blk-mq.c | 12 ++++++++---- block/blk.h | 5 ++--- include/linux/blk-mq.h | 10 ++-------- 4 files changed, 21 insertions(+), 25 deletions(-) -- 2.39.2