On 10/15/19 11:04 PM, Alexei Starovoitov wrote: > On Mon, Oct 14, 2019 at 5:21 AM Hou Tao <houtao1@xxxxxxxxxx> wrote: >> >> For network stack, RPS, namely Receive Packet Steering, is used to >> distribute network protocol processing from hardware-interrupted CPU >> to specific CPUs and alleviating soft-irq load of the interrupted CPU. >> >> For block layer, soft-irq (for single queue device) or hard-irq >> (for multiple queue device) is used to handle IO completion, so >> RPS will be useful when the soft-irq load or the hard-irq load >> of a specific CPU is too high, or a specific CPU set is required >> to handle IO completion. >> >> Instead of setting the CPU set used for handling IO completion >> through sysfs or procfs, we can attach an eBPF program to the >> request-queue, provide some useful info (e.g., the CPU >> which submits the request) to the program, and let the program >> decides the proper CPU for IO completion handling. >> >> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> > ... >> >> + rcu_read_lock(); >> + prog = rcu_dereference_protected(q->prog, 1); >> + if (prog) >> + bpf_ccpu = BPF_PROG_RUN(q->prog, NULL); >> + rcu_read_unlock(); >> + >> cpu = get_cpu(); >> - if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags)) >> - shared = cpus_share_cache(cpu, ctx->cpu); >> + if (bpf_ccpu < 0 || !cpu_online(bpf_ccpu)) { >> + ccpu = ctx->cpu; >> + if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags)) >> + shared = cpus_share_cache(cpu, ctx->cpu); >> + } else >> + ccpu = bpf_ccpu; >> >> - if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) { >> + if (cpu != ccpu && !shared && cpu_online(ccpu)) { >> rq->csd.func = __blk_mq_complete_request_remote; >> rq->csd.info = rq; >> rq->csd.flags = 0; >> - smp_call_function_single_async(ctx->cpu, &rq->csd); >> + smp_call_function_single_async(ccpu, &rq->csd); > > Interesting idea. > Not sure whether such programability makes sense from > block layer point of view. > > From bpf side having a program with NULL input context is > a bit odd. We never had such things in the past, so this patchset > won't work as-is. > Also no-input means that the program choices are quite limited. > Other than round robin and random I cannot come up with other > cpu selection ideas. > I suggest to do writable tracepoint here instead. > Take a look at trace_nbd_send_request. > BPF prog can write into 'request'. > For your use case it will be able to write into 'bpf_ccpu' local variable. > If you keep it as raw tracepoint and don't add the actual tracepoint > with TP_STRUCT__entry and TP_fast_assign then it won't be abi > and you can change it later or remove it altogether. > That basically was my idea, too. Actually I was coming from a different angle, namely trying to figure out how we could do generic error injection in the block layer. eBPF would be one way of doing it, kprobes another. But writable trace events ... I'll have to check if we can leverage that here, too. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@xxxxxxx +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 247165 (AG München), GF: Felix Imendörffer