Re: [RFC PATCH v2 bpf-next 00/15] xdp_flow: Flow offload to XDP

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Mon, 25 Nov 2019 14:03:14 +0100

Toshiaki Makita <toshiaki.makita1@xxxxxxxxx> writes:

> On 2019/11/22 20:54, Toke Høiland-Jørgensen wrote:
>> Toshiaki Makita <toshiaki.makita1@xxxxxxxxx> writes:
>> 
>>> On 2019/11/18 19:20, Toke Høiland-Jørgensen wrote:
>>>> Toshiaki Makita <toshiaki.makita1@xxxxxxxxx> writes:
>>>>
>>>> [... trimming the context a bit ...]
>>>>
>>>>>>>> Take your example of TC rules: You were proposing a flow like this:
>>>>>>>>
>>>>>>>> Userspace TC rule -> kernel rule table -> eBPF map -> generated XDP
>>>>>>>> program
>>>>>>>>
>>>>>>>> Whereas what I mean is that we could do this instead:
>>>>>>>>
>>>>>>>> Userspace TC rule -> kernel rule table
>>>>>>>>
>>>>>>>> and separately
>>>>>>>>
>>>>>>>> XDP program -> bpf helper -> lookup in kernel rule table
>>>>>>>
>>>>>>> Thanks, now I see what you mean.
>>>>>>> You expect an XDP program like this, right?
>>>>>>>
>>>>>>> int xdp_tc(struct xdp_md *ctx)
>>>>>>> {
>>>>>>> 	int act = bpf_xdp_tc_filter(ctx);
>>>>>>> 	return act;
>>>>>>> }
>>>>>>
>>>>>> Yes, basically, except that the XDP program would need to parse the
>>>>>> packet first, and bpf_xdp_tc_filter() would take a parameter struct with
>>>>>> the parsed values. See the usage of bpf_fib_lookup() in
>>>>>> bpf/samples/xdp_fwd_kern.c
>>>>>>
>>>>>>> But doesn't this way lose a chance to reduce/minimize the program to
>>>>>>> only use necessary features for this device?
>>>>>>
>>>>>> Not necessarily. Since the BPF program does the packet parsing and fills
>>>>>> in the TC filter lookup data structure, it can limit what features are
>>>>>> used that way (e.g., if I only want to do IPv6, I just parse the v6
>>>>>> header, ignore TCP/UDP, and drop everything that's not IPv6). The lookup
>>>>>> helper could also have a flag argument to disable some of the lookup
>>>>>> features.
>>>>>
>>>>> It's unclear to me how to configure that.
>>>>> Use options when attaching the program? Something like
>>>>> $ xdp_tc attach eth0 --only-with ipv6
>>>>> But can users always determine their necessary features in advance?
>>>>
>>>> That's what I'm doing with xdp-filter now. But the answer to your second
>>>> question is likely to be 'probably not', so it would be good to not have
>>>> to do this :)
>>>>
>>>>> Frequent manual reconfiguration when TC rules frequently changes does
>>>>> not sound nice. Or, add hook to kernel to listen any TC filter event
>>>>> on some daemon and automatically reload the attached program?
>>>>
>>>> Doesn't have to be a kernel hook; we could enhance the userspace tooling
>>>> to do it. Say we integrate it into 'tc':
>>>>
>>>> - Add a new command 'tc xdp_accel enable <iface> --features [ipv6,etc]'
>>>> - When adding new rules, add the following logic:
>>>>     - Check if XDP acceleration is enabled
>>>>     - If it is, check whether the rule being added fits into the current
>>>>       'feature set' loaded on that interface.
>>>>       - If the rule needs more features, reload the XDP program to one
>>>>         with the needed additional features.
>>>>       - Or, alternatively, just warn the user and let them manually
>>>>         replace it?
>>>
>>> Ok, but there are other userspace tools to configure tc in wild.
>>> python and golang have their own netlink library project.
>>> OVS embeds TC netlink handling code in itself. There may be more tools like this.
>>> I think at least we should have rtnl notification about TC and monitor it
>>> from daemon, if we want to reload the program from userspace tools.
>> 
>> A daemon would be one way to do this in cases where it needs to be
>> completely dynamic. My guess is that there are lots of environments
>> where that is not required, and where a user/administrator could
>> realistically specify ahead of time which feature set they want to
>> enable XDP acceleration for. So in my mind the way to go about this is
>> to implement the latter first, then add dynamic reconfiguration of it on
>> top when (or if) it turns out to be necessary...
>
> Hmm, but I think there is big difference between a daemon and a cli tool.
> Shouldn't we determine the design considering future usage?

Sure, we should make sure the design doesn't exclude either option. But
we also shouldn't end up in a "the perfect is the enemy of the good"
type of situation. And the kernel-side changes are likely to be somewhat
independent of what the userspace management ends up looking like...

-Toke