Re: Different packet handling after bpf_redirect_map with BPF_F_BROADCAST

Florian Kauer <florian.kauer@xxxxxxxxxxxxx> · Thu, 4 Jul 2024 14:00:43 +0200

Hi Toke,
thanks a lot for the prompt response!

On 7/4/24 13:20, Toke Høiland-Jørgensen wrote:
> Florian Kauer <florian.kauer@xxxxxxxxxxxxx> writes:
> 
>> Hi,
>> we are currently using bpf_redirect_map with BPF_F_BROADCAST to replicate frames for sending traffic over redundant paths.
>>
>> See for example https://www.rfc-editor.org/rfc/rfc8655.html#section-3.2.2.2 for background
>> and https://github.com/EricssonResearch/xdpfrer/blob/5f0845cb2e4c4da325f0e77df3020526ad992aff/src/xdpfrer.bpf.c#L393 for the current implementation.
>>
>> However, we want to modify the frame after the replication. In the easiest case this means to change the VLAN tag to route the traffic over different VLANs. This is currently done by taking a different egress_ifindex into account after the replication and that works well so far ( https://github.com/EricssonResearch/xdpfrer/blob/5f0845cb2e4c4da325f0e77df3020526ad992aff/src/xdpfrer.bpf.c#L399 ).
>>
>> BUT there are cases where the egress_interface for both replicated packets shall be the same and the different path of the replicated frames is only taken on a subsequent switch based on a different VLAN tag. So how could the XDP program differentiate between the different replicated frames if the egress_interface is the same?
>>
>> So my basic idea would be to add two (or more) DEVMAP entries with the same ifindex into the same map. And then either
>>
>> 1. Add several xdp/devmap progs to the same loaded bpf and reference them in the DEVMAP entry, like
>>
>> SEC("xdp/devmap")
>> int replicate_postprocessing_first(struct xdp_md *pkt)
>> {
>>     int ret = change_vlan(pkt, 0, true);
>>     ...
>> }
>>
>> SEC("xdp/devmap")
>> int replicate_postprocessing_second(struct xdp_md *pkt)
>> {
>>     int ret = change_vlan(pkt, 1, true);
>>     ...
>> }
>>
>> This, however, would be quite unflexible.
> 
> Having multiple entries in the devmap entry corresponds roughly to how
> the stack handles VLANs. I.e., when configuring a VLAN, you create a new
> netdevice (which you would then put into the devmap). Unfortunately, XDP
> doesn't really know how to deal with stacked devices like VLANs, so you
> can't actually add a VLAN device into a devmap. But creating an
> interface for this would be one way of dealing with a situation like
> this, without having to hardcode things into a BPF program.

I see. That would be very nice in general, but for our specific application
likely still to unflexible to refer to a different VLAN interface
(e.g. in addition to changing the VLAN tag we also might want to
add/remove/modify MPLS labels and other headers).

> 
>> 2. Load the same bpf several times without attaching it to an
>> interface and set e.g. a const to a different value after loading.
> 
> This would work, I think. Something like:
> 
> static volatile const vlan_id = 1;
> 
> SEC("xdp/devmap")
> int replicate_postprocessing_second(struct xdp_md *pkt)
> {
>     int ret = change_vlan(pkt, vlan_id, true);
>     ...
> }
> 
> and then the loader would replace the value of vlan_id before loading;
> using skeletons this would look something like:
> 
> skel = xdp_program_skeleton__open();
> skel->rodata->vlan_id = 2;
> xdp_program_skeleton__load();
> 
> /* attach to devmap */

Yes, that is exactly what I was imagining, thanks!

> 
>> But can I even reference a xdp/devmap prog from a different loaded
>> bpf, especially when it is not attached?
> 
> Why do you need to reference it from a different BPF program? The
> userspace program just attaches it to the right devmap entry?

What I wanted to imply with this is that the lifetime of the different BPF
programs is unclear to me. So AFAIK (but I might be totally wrong) an
XDP program has the lifetime of the process that loaded the program
(i.e. called xdp_program_skeleton__load()) so it is destroyed/unloaded
as soon as the process ends, UNLESS it is explicitly attached to an interface
by which it inherits the lifetime of the interface it was attached to
(i.e. might outlive the loading process).

If I do what you sketched above, where we do not attach the program explicitly
to an interface but only directly to the devmap, does it then inherit the
lifetime of the interface of the BPF program the devmap belongs to or is
it destroyed as soon as the loading process end?

The latter would invalidate the bpf_prog inside the devmap (very bad)
while the first seems very complex to handle (especially since I would expect
I can attach the same loaded program to different devmaps of different
interfaces). But you probably have successfully tackled this complexity
already :-)

> 
>> 3. Extend the kernel with a way to let the xdp/devmap prog know from
>> which DEVMAP entry its execution originates (like an additional entry
>> in the bpf_devmap_val that is then set in the xdp_md).
> 
> This could be useful in any case, so I would personally be fine with
> adding something like this (for both devmap and cpumap) :)

Would you prefer a simple u32 (or similar) that could then be used as
index for an array or a more complex data structure/void* to fill
with arbitrary data?

The first would require an additional lookup (which is OK, but a little
overhead), but for the latter it is not clear to me where the data would
actually be located in memory...

> 
> -Toke
> 

Thanks,
Florian