Florian Westphal <fw@xxxxxxxxx> writes: > This adds a small internal mapping table so that a new bpf (xdp) kfunc > can perform lookups in a flowtable. > > I have no intent to push this without nft integration of the xdp program, > this RFC is just to get comments on the general direction because there > is a chicken/egg issue: > > As-is, xdp program has access to the device pointer, but no way to do a > lookup in a flowtable -- there is no way to obtain the needed struct > without whacky stunts. So IIUC correctly, this would all be controlled by userspace anyway (by the nft binary), right? In which case, couldn't userspace also provide the reference to the right flowtable instance, by sticking it into a bpf map? We'd probably need some special handling on the UAPI side to insert a flowtable pointer, but from the BPF side it could just look like a kptr in a map that the program pulls out and passes to the lookup kfunc. And the map would take a refcnt, making sure the table doesn't disappear underneath the XDP program. It could even improve performance since there would be one less hashtable lookup. The drawback would be that this would make it harder to integrate into other XDP data planes, as you'd need to coordinate with nft to keep the right flowtable references alive even if nft doesn't control the XDP program. But maybe that's doable, somehow? [...] > My thinking is to add a xdp-offload flag to the nft grammer only. > Its not needed on nf uapi side and it would tell nft to attach the xdp > flowtable forward program to the devices listed in the flowtable. > > Also, packet flow is altered (qdiscs is bypassed), which is a strong > argument against default-usage. I agree that at this point XDP has two many quirks to be something we can turn on by default. However, I think we should support XDP data planes that are not necessarily under the control of nft itself. Specifically, I am planning to add an 'xdp-forward' utility to xdp-tools which would enable a semi-automatic XDP fast path using both this and other hooks like the fib lookup helper. So it would be nice to make the different pieces as loosely coupled as is practical (cf what I wrote above). > Open questions: > > Do we need to support dev-in-multiple flowtables? I would like to > avoid this, this likely means the future "xdp" flag in nftables would > be restricted to "inet" family. Alternative would be to change the key to > 'device address plus protocol family', the xdp prog could derive that from the > packet data. We can always start with the simple case and add more options later if it turns out to be useful? With kfuncs we do have some flexibility in terms of adjusting the API (although I think we should strive for keeping it as stable as we can). > Timeout handling. Should the XDP program even bother to refresh the > flowtable timeout? > > It might make more sense to intentionally have packets > flow through the normal path periodically so neigh entries are up to > date. Hmm, I see what you mean, but I worry that this would lead to some nasty latency blips when a flow transitions back and forth between kernel and XDP paths. Also, there's a reordering problem as the state is changed: the first goes through the stack, sets the flow state to active, then gets transmitted. But while that sits in the qdisc waiting to go out on the wire, the next packet arrives, gets handled by the XDP fastpath and ends up overtaking the first packet on the TX side. Not sure we have a good solution for this in general :( -Toke