Re: [RFC PATCH bpf-next 00/14] xdp_flow: Flow offload to XDP

Toshiaki Makita <toshiaki.makita1@xxxxxxxxx> · Wed, 21 Aug 2019 17:49:33 +0900

On 19/08/20 (火) 3:15:46, Jakub Kicinski wrote:

I'm on vacation and replying slowly. Sorry for any inconvenience.

On Sat, 17 Aug 2019 23:01:59 +0900, Toshiaki Makita wrote:
On 19/08/17 (土) 3:52:24, Jakub Kicinski wrote:
On Fri, 16 Aug 2019 10:28:10 +0900, Toshiaki Makita wrote:
On 2019/08/16 4:22, Jakub Kicinski wrote:
There's a certain allure in bringing the in-kernel BPF translation
infrastructure forward. OTOH from system architecture perspective IMHO
it does seem like a task best handed in user space. bpfilter can replace
iptables completely, here we're looking at an acceleration relatively
loosely coupled with flower.

I don't think it's loosely coupled. Emulating TC behavior in userspace
is not so easy.

Think about recent multi-mask support in flower. Previously userspace could
assume there is one mask and hash table for each preference in TC. After the
change TC accepts different masks with the same pref. Such a change tends to
break userspace emulation. It may ignore masks passed from flow insertion
and use the mask remembered when the first flow of the pref is inserted. It
may override the mask of all existing flows with the pref. It may fail to
insert such flows. Any of them would result in unexpected wrong datapath
handling which is critical.
I think such an emulation layer needs to be updated in sync with TC.

Oh, so you're saying that if xdp_flow is merged all patches to
cls_flower and netfilter which affect flow offload will be required
to update xdp_flow as well?

Hmm... you are saying that we are allowed to break other in-kernel
subsystem by some change? Sounds strange...

No I'm not saying that, please don't put words in my mouth.

If we ignore xdp_flow when modifying something which affects flow 
offload, that may cause breakage. I showed such an example using 
multi-mask support. So I just wondered what you mean and guessed you 
think we can break other subsystem in some situation.

I admit I should not have used the wording "you are saying...?". If it 
was not unpleasant to you I'm sorry about that. But I think you should 
not use it as well. I did not say "cls_flower and netfilter which affect 
flow offload will be required to update xdp_flow". I guess most patches 
which affect flow offload core will not break xdp_flow. In some cases 
breakage may happen. In that case we need to fix xdp_flow as well.

I'm asking you if that's your intention.

Having an implementation nor support a feature of another implementation
and degrade gracefully to the slower one is not necessarily breakage.
We need to make a concious decision here, hence the clarifying question.

As I described above, breakage can happen in some case, and if the patch 
breaks xdp_flow I think we need to fix xdp_flow at the same time. If 
xdp_flow does not support newly added features but it works for existing 
ones, it is OK. In the first place not all features can be offloaded to 
xdp_flow. I think this is the same as HW-offload.

That's a question of policy. Technically the implementation in user
space is equivalent.

The advantage of user space implementation is that you can add more
to it and explore use cases which do not fit in the flow offload API,
but are trivial for BPF. Not to mention the obvious advantage of
decoupling the upgrade path.

I understand the advantage, but I can't trust such a third-party kernel
emulation solution for this kind of thing which handles critical data path.

That's a strange argument to make. All production data path BPF today
comes from user space.

Probably my explanation was not sufficient. What I'm concerned about is 
that this needs to emulate kernel behavior, and it is difficult.
I don't think userspace-defined datapath itself is not reliable, nor 
eBPF ecosystem.

Toshiaki Makita