Hi Alexei, On Wed, Sep 09, 2020 at 02:52:06PM -0700, Alexei Starovoitov wrote: > On Mon, Sep 07, 2020 at 04:27:21PM +0800, Hangbin Liu wrote: > > This patch is for xdp multicast support. which has been discussed > > before[0], The goal is to be able to implement an OVS-like data plane in > > XDP, i.e., a software switch that can forward XDP frames to multiple ports. > > > > To achieve this, an application needs to specify a group of interfaces > > to forward a packet to. It is also common to want to exclude one or more > > physical interfaces from the forwarding operation - e.g., to forward a > > packet to all interfaces in the multicast group except the interface it > > arrived on. While this could be done simply by adding more groups, this > > quickly leads to a combinatorial explosion in the number of groups an > > application has to maintain. > > > > To avoid the combinatorial explosion, we propose to include the ability > > to specify an "exclude group" as part of the forwarding operation. This > > needs to be a group (instead of just a single port index), because a > > physical interface can be part of a logical grouping, such as a bond > > device. > > > > Thus, the logical forwarding operation becomes a "set difference" > > operation, i.e. "forward to all ports in group A that are not also in > > group B". This series implements such an operation using device maps to > > represent the groups. This means that the XDP program specifies two > > device maps, one containing the list of netdevs to redirect to, and the > > other containing the exclude list. > > "set difference" and BPF_F_EXCLUDE_INGRESS makes sense to me as high level api, > but I don't see how program or helper is going to modify the packet > before multicasting it. > Even to implement a basic switch the program would need to modify destination > mac addresses before xmiting it on the device. > In case of XDP_TX the bpf program is doing it manually. > With this api the program is out of the loop. > It can prepare a packet for one target netdev, but sending the same > packet as-is to other netdevs isn't going to to work correctly. Yes, we can't modify the packets on ingress as there are multi egress ports and each one may has different requirements. So this helper will only forward the packets to other group(looks like a multicast group) devices. I think the packets modification (edit dst mac, add vlan tag, etc) should be done on egress, which rely on David's XDP egress support. > Veth-s and tap-s don't care about mac and the stack will silently accept > packets even with wrong mac. > The same thing may happen with physical netdevs. The driver won't care > that dst mac is wrong. It will xmit it out, but the other side of the wire > will likely drop that packet unless it's promisc. > Properly implemented bridge shouldn't be doing it, but > I really don't see how this api can work in practice to implement real bridge. > What am I missing? Not sure if I missed something. Does current linux bridge do dst mac modification? I thought it only forward packets(although it has fdb instead of flush the packet to all ports) On patch 4/5 there is an example about forwarding packets. It still need to get remote's mac address by arp/nd. Thanks Hangbin