On 08/16, Toshiaki Makita wrote: > On 2019/08/16 0:21, Stanislav Fomichev wrote: > > On 08/15, Toshiaki Makita wrote: > > > On 2019/08/15 2:07, Stanislav Fomichev wrote: > > > > On 08/13, Toshiaki Makita wrote: > > > > > * Implementation > > > > > > > > > > xdp_flow makes use of UMH to load an eBPF program for XDP, similar to > > > > > bpfilter. The difference is that xdp_flow does not generate the eBPF > > > > > program dynamically but a prebuilt program is embedded in UMH. This is > > > > > mainly because flow insertion is considerably frequent. If we generate > > > > > and load an eBPF program on each insertion of a flow, the latency of the > > > > > first packet of ping in above test will incease, which I want to avoid. > > > > Can this be instead implemented with a new hook that will be called > > > > for TC events? This hook can write to perf event buffer and control > > > > plane will insert/remove/modify flow tables in the BPF maps (contol > > > > plane will also install xdp program). > > > > > > > > Why do we need UMH? What am I missing? > > > > > > So you suggest doing everything in xdp_flow kmod? > > You probably don't even need xdp_flow kmod. Add new tc "offload" mode > > (bypass) that dumps every command via netlink (or calls the BPF hook > > where you can dump it into perf event buffer) and then read that info > > from userspace and install xdp programs and modify flow tables. > > I don't think you need any kernel changes besides that stream > > of data from the kernel about qdisc/tc flow creation/removal/etc. > > My intention is to make more people who want high speed network easily use XDP, > so making transparent XDP offload with current TC interface. > > What userspace program would monitor TC events with your suggestion? Have a new system daemon (xdpflowerd) that is independently packaged/shipped/installed. Anybody who wants accelerated TC can download/install it. OVS can be completely unaware of this. > ovs-vswitchd? If so, it even does not need to monitor TC. It can > implement XDP offload directly. > (However I prefer kernel solution. Please refer to "About alternative > userland (ovs-vswitchd etc.) implementation" section in the cover letter.) > > Also such a TC monitoring solution easily can be out-of-sync with real TC > behavior as TC filter/flower is being heavily developed and changed, > e.g. introduction of TC block, support multiple masks with the same pref, etc. > I'm not sure such an unreliable solution have much value. This same issue applies to the in-kernel implementation, isn't it? What happens if somebody sends patches for a new flower feature but doesn't add appropriate xdp support? Do we reject them? That's why I'm suggesting to move this problem to the userspace :-) > > But, I haven't looked at the series deeply, so I might be missing > > something :-) > > > > > I also thought about that. There are two phases so let's think about them separately. > > > > > > 1) TC block (qdisc) creation / eBPF load > > > > > > I saw eBPF maintainers repeatedly saying eBPF program loading needs to be > > > done from userland, not from kernel, to run the verifier for safety. > > > However xdp_flow eBPF program is prebuilt and embedded in kernel so we may > > > allow such programs to be loaded from kernel? I currently don't have the will > > > to make such an API as loading can be done with current UMH mechanism. > > > > > > 2) flow insertion / eBPF map update > > > > > > Not sure if this needs to be done from userland. One concern is that eBPF maps can > > > be modified by unrelated processes and we need to handle all unexpected state of maps. > > > Such handling tends to be difficult and may cause unexpected kernel behavior. > > > OTOH updating maps from kmod may reduces the latency of flow insertion drastically. > > Latency from the moment I type 'tc filter add ...' to the moment the rule > > is installed into the maps? Does it really matter? > > Yes it matters. Flow insertion is kind of data path in OVS. > Please see how ping latency is affected in the cover letter. Ok, but what I'm suggesting shouldn't be less performant. We are talking about UMH writing into a pipe vs writing TC events into a netlink. > > Do I understand correctly that both of those events (qdisc creation and > > flow insertion) are triggered from tcf_block_offload_cmd (or similar)? > > Both of eBPF load and map update are triggered from tcf_block_offload_cmd. > I think you understand it correctly. > > Toshiaki Makita