Hangbin Liu <liuhangbin@xxxxxxxxx> writes: > Hi all, > > This patchset is for xdp multicast support, which has been discussed > before[0]. The goal is to be able to implement an OVS-like data plane in > XDP, i.e., a software switch that can forward XDP frames to multiple > ports. > > To achieve this, an application needs to specify a group of interfaces > to forward a packet to. It is also common to want to exclude one or more > physical interfaces from the forwarding operation - e.g., to forward a > packet to all interfaces in the multicast group except the interface it > arrived on. While this could be done simply by adding more groups, this > quickly leads to a combinatorial explosion in the number of groups an > application has to maintain. > > To avoid the combinatorial explosion, we propose to include the ability > to specify an "exclude group" as part of the forwarding operation. This > needs to be a group (instead of just a single port index), because a > physical interface can be part of a logical grouping, such as a bond > device. > > Thus, the logical forwarding operation becomes a "set difference" > operation, i.e. "forward to all ports in group A that are not also in > group B". This series implements such an operation using device maps to > represent the groups. This means that the XDP program specifies two > device maps, one containing the list of netdevs to redirect to, and the > other containing the exclude list. > > To achieve this, I re-implement a new helper bpf_redirect_map_multi() > to accept two maps, the forwarding map and exclude map. If user > don't want to use exclude map and just want simply stop redirecting back > to ingress device, they can use flag BPF_F_EXCLUDE_INGRESS. > > The example in patch 2 is functional, but not a lot of effort > has been made on performance optimisation. I did a simple test(pkt size 64) > with pktgen. Here is the test result with BPF_MAP_TYPE_DEVMAP_HASH > arrays: > > bpf_redirect_map() with 1 ingress, 1 egress: > generic path: ~1600k pps > native path: ~980k pps > > bpf_redirect_map_multi() with 1 ingress, 3 egress: > generic path: ~600k pps > native path: ~480k pps > > bpf_redirect_map_multi() with 1 ingress, 9 egress: > generic path: ~125k pps > native path: ~100k pps > > The bpf_redirect_map_multi() is slower than bpf_redirect_map() as we loop > the arrays and do clone skb/xdpf. The native path is slower than generic > path as we send skbs by pktgen. So the result looks reasonable. How are you running these tests? Still on virtual devices? We really need results from a physical setup in native mode to assess the impact on the native-XDP fast path. The numbers above don't tell much in this regard. I'd also like to see a before/after patch for straight bpf_redirect_map(), since you're messing with the fast path, and we want to make sure it's not causing a performance regression for regular redirect. Finally, since the overhead seems to be quite substantial: A comparison with a regular network stack bridge might make sense? After all we also want to make sure it's a performance win over that :) -Toke