On 08/25/2017 08:35 PM, Eric Leblond wrote: [...]
OK, this looks like what we were already doing in Suricata so it should be ok. If I get correctly the design, we will have a per CPU load balancing. The CPU reading the packet will send data to his own ring buffer via the bpf_perf_event_output that don't take any CPU related parameters. As we are really early in the processing, this means that
Yeah, if you look at __bpf_perf_event_output(), it's basically the event->oncpu != cpu which would otherwise let it bail out, but needed iiuc to ensure the RB can be written w/o having to take locks. The CPU related 'parameter' is basically set up by the 'orchestrator'. You have the perf event map, and given at which index you place the corresponding perf fd, you can either use BPF_F_CURRENT_CPU if the mapping is 1:1 (cpu -> perf fd set up for this cpu) or a custom index if you have a use case where you need to demux to one of multiple perf RBs for that CPU.
the per-CPU load balancing will be done by the card.
Right given you need to have the replies steered into the same perf RB 'channel' for further processing.