On 2023/6/28 10:20, Ming Lei wrote: > On Tue, Jun 27, 2023 at 08:08:51PM +0800, chengming.zhou@xxxxxxxxx wrote: >> From: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> >> >> If request need to be completed remotely, we insert it into percpu llist, >> and smp_call_function_single_async() if llist is empty previously. >> >> We don't need to use per-rq csd, percpu csd is enough. And the size of >> struct request is decreased by 24 bytes. >> >> Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> >> --- >> block/blk-mq.c | 12 ++++++++---- >> include/linux/blk-mq.h | 5 +---- >> 2 files changed, 9 insertions(+), 8 deletions(-) >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index decb6ab2d508..a36822479b94 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -43,6 +43,7 @@ >> #include "blk-ioprio.h" >> >> static DEFINE_PER_CPU(struct llist_head, blk_cpu_done); >> +static DEFINE_PER_CPU(struct __call_single_data, blk_cpu_csd); > > It might be better to use call_single_data, given: > > /* Use __aligned() to avoid to use 2 cache lines for 1 csd */ > typedef struct __call_single_data call_single_data_t > __aligned(sizeof(struct __call_single_data)); > Good, I will change to use this. >> >> static void blk_mq_insert_request(struct request *rq, blk_insert_t flags); >> static void blk_mq_request_bypass_insert(struct request *rq, >> @@ -1156,13 +1157,13 @@ static void blk_mq_complete_send_ipi(struct request *rq) >> { >> struct llist_head *list; >> unsigned int cpu; >> + struct __call_single_data *csd; >> >> cpu = rq->mq_ctx->cpu; >> list = &per_cpu(blk_cpu_done, cpu); >> - if (llist_add(&rq->ipi_list, list)) { >> - INIT_CSD(&rq->csd, __blk_mq_complete_request_remote, rq); >> - smp_call_function_single_async(cpu, &rq->csd); >> - } >> + csd = &per_cpu(blk_cpu_csd, cpu); >> + if (llist_add(&rq->ipi_list, list)) >> + smp_call_function_single_async(cpu, csd); >> } > > This way is cleaner, and looks correct, given block softirq is guaranteed to be > scheduled to consume the list if one new request is added to this percpu list, > either smp_call_function_single_async() returns -EBUSY or 0. > If this llist_add() see the llist is empty, the consumer function in the softirq on the remote CPU must have consumed the llist, so smp_call_function_single_async() won't return -EBUSY ? Thanks.