Hi, On 2019/10/16 5:04, Alexei Starovoitov wrote: > On Mon, Oct 14, 2019 at 5:21 AM Hou Tao <houtao1@xxxxxxxxxx> wrote: >> >> For network stack, RPS, namely Receive Packet Steering, is used to >> distribute network protocol processing from hardware-interrupted CPU >> to specific CPUs and alleviating soft-irq load of the interrupted CPU. >> >> For block layer, soft-irq (for single queue device) or hard-irq >> (for multiple queue device) is used to handle IO completion, so >> RPS will be useful when the soft-irq load or the hard-irq load >> of a specific CPU is too high, or a specific CPU set is required >> to handle IO completion. >> >> Instead of setting the CPU set used for handling IO completion >> through sysfs or procfs, we can attach an eBPF program to the >> request-queue, provide some useful info (e.g., the CPU >> which submits the request) to the program, and let the program >> decides the proper CPU for IO completion handling. >> >> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> > ... >> >> + rcu_read_lock(); >> + prog = rcu_dereference_protected(q->prog, 1); >> + if (prog) >> + bpf_ccpu = BPF_PROG_RUN(q->prog, NULL); >> + rcu_read_unlock(); >> + >> cpu = get_cpu(); >> - if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags)) >> - shared = cpus_share_cache(cpu, ctx->cpu); >> + if (bpf_ccpu < 0 || !cpu_online(bpf_ccpu)) { >> + ccpu = ctx->cpu; >> + if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags)) >> + shared = cpus_share_cache(cpu, ctx->cpu); >> + } else >> + ccpu = bpf_ccpu; >> >> - if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) { >> + if (cpu != ccpu && !shared && cpu_online(ccpu)) { >> rq->csd.func = __blk_mq_complete_request_remote; >> rq->csd.info = rq; >> rq->csd.flags = 0; >> - smp_call_function_single_async(ctx->cpu, &rq->csd); >> + smp_call_function_single_async(ccpu, &rq->csd); > > Interesting idea. > Not sure whether such programability makes sense from > block layer point of view. > >>From bpf side having a program with NULL input context is > a bit odd. We never had such things in the past, so this patchset > won't work as-is. No, it just works. > Also no-input means that the program choices are quite limited. > Other than round robin and random I cannot come up with other > cpu selection idea> I suggest to do writable tracepoint here instead. > Take a look at trace_nbd_send_request. > BPF prog can write into 'request'. > For your use case it will be able to write into 'bpf_ccpu' local variable. > If you keep it as raw tracepoint and don't add the actual tracepoint > with TP_STRUCT__entry and TP_fast_assign then it won't be abi > and you can change it later or remove it altogether. > Your suggestion is much simpler, so there will be no need for adding a new program type, and all things need to be done are adding a raw tracepoint, moving bpf_ccpu into struct request, and letting a BPF program to modify it. I will try and thanks for your suggestions. Regards, Tao > . >