On 9/9/20 8:35 PM, Hangbin Liu wrote: > Hi Alexei, > > On Wed, Sep 09, 2020 at 02:52:06PM -0700, Alexei Starovoitov wrote: >> On Mon, Sep 07, 2020 at 04:27:21PM +0800, Hangbin Liu wrote: >>> This patch is for xdp multicast support. which has been discussed >>> before[0], The goal is to be able to implement an OVS-like data plane in >>> XDP, i.e., a software switch that can forward XDP frames to multiple ports. >>> >>> To achieve this, an application needs to specify a group of interfaces >>> to forward a packet to. It is also common to want to exclude one or more >>> physical interfaces from the forwarding operation - e.g., to forward a >>> packet to all interfaces in the multicast group except the interface it >>> arrived on. While this could be done simply by adding more groups, this >>> quickly leads to a combinatorial explosion in the number of groups an >>> application has to maintain. >>> >>> To avoid the combinatorial explosion, we propose to include the ability >>> to specify an "exclude group" as part of the forwarding operation. This >>> needs to be a group (instead of just a single port index), because a >>> physical interface can be part of a logical grouping, such as a bond >>> device. >>> >>> Thus, the logical forwarding operation becomes a "set difference" >>> operation, i.e. "forward to all ports in group A that are not also in >>> group B". This series implements such an operation using device maps to >>> represent the groups. This means that the XDP program specifies two >>> device maps, one containing the list of netdevs to redirect to, and the >>> other containing the exclude list. >> >> "set difference" and BPF_F_EXCLUDE_INGRESS makes sense to me as high level api, >> but I don't see how program or helper is going to modify the packet >> before multicasting it. >> Even to implement a basic switch the program would need to modify destination >> mac addresses before xmiting it on the device. >> In case of XDP_TX the bpf program is doing it manually. >> With this api the program is out of the loop. >> It can prepare a packet for one target netdev, but sending the same >> packet as-is to other netdevs isn't going to to work correctly. > > Yes, we can't modify the packets on ingress as there are multi egress ports > and each one may has different requirements. So this helper will only forward > the packets to other group(looks like a multicast group) devices. > > I think the packets modification (edit dst mac, add vlan tag, etc) should be > done on egress, which rely on David's XDP egress support. agreed. The DEVMAP used for redirect can have programs attached that update the packet headers - assuming you want to update them. This is tagged as "multicast" support but it really is redirecting a packet to multiple devices. One use case I see that evolves from this set is the ability to both forward packets (e.g., host ingress to VM) and grab a copy tcpdump style by redirecting packets to a virtual device (similar to a patch set for dropwatch). ie., no need for an perf-events style copy to push to userspace.