On 17-04-27 04:31 PM, Alexei Starovoitov wrote: > On 4/27/17 1:41 AM, Jesper Dangaard Brouer wrote: >> When registering/attaching a XDP/bpf program, we would just send the >> file-descriptor for this port-map along (like we do with the bpf_prog >> FD). Plus, it own ingress-port number this program is in the port-map. >> >> It is not clear to me, in-which-data-structure on the kernel-side we >> store this reference to the port-map and ingress-port. As today we only >> have the "raw" struct bpf_prog pointer. I see several options: >> >> 1. Create a new xdp_prog struct that contains existing bpf_prog, >> a port-map pointer and ingress-port. (IMHO easiest solution) >> >> 2. Just create a new pointer to port-map and store it in driver rx-ring >> struct (like existing bpf_prog), but this create a race-challenge >> replacing (cmpxchg) the program (or perhaps it's not a problem as it >> runs under rcu and RTNL-lock). >> >> 3. Extend bpf_prog to store this port-map and ingress-port, and have a >> fast-way to access it. I assume it will be accessible via >> bpf_prog->bpf_prog_aux->used_maps[X] but it will be too slow for XDP. > > I'm not sure I completely follow the 3 proposals. > Are you suggesting to have only one netdev_array per program? > Why not to allow any number like we do for tailcall+prog_array, etc? > We can teach verifier to allow new helper > bpf_tx_port(netdev_array, port_num); > to only be used with netdev_array map type. > It will fetch netdevice pointer from netdev_array[port_num] > and will tx the packet into it. > We can make it similar to bpf_tail_call(), so that program will > finish on successful bpf_tx_port() or > make it into 'delayed' tx which will be executed when program finishes. > Not sure which approach is better. My reaction would be to make it finish on success but would like to write a few programs first and see. I can't think of any use _not_ to terminate but maybe there is something I'm missing. > > We can also extend this netdev_array into broadcast/multicast. Like > bpf_tx_allports(&netdev_array); > call from the program will xmit the packet to all netdevices > in that 'netdev_array' map type. Yep nice solution to the multicast problem. > > The map-in-map support can be trivially extended to allow netdev_array, > then the program can create N multicast groups of netdevices. > Each multicast group == one netdev_array map. > The user space will populate a hashmap with these netdev_arrays and > bpf kernel side can select dynamically which multicast group to use > to send the packets to. > bpf kernel side may look like: > struct bpf_netdev_array *netdev_array = bpf_map_lookup_elem(&hash, key); > if (!netdev_array) > ... > if (my_condition) > bpf_tx_allports(netdev_array); /* broadcast to all netdevices */ > else > bpf_tx_port(netdev_array, port_num); /* tx into one netdevice */ > > that's an artificial example. Just trying to point out > that we shouldn't restrict the feature too soon. > That is more or less what I was thinking as well. The other question I have though is should we have a bpf_redirect() call for the simple case where I use the ifindex directly. This will be helpful for taking existing programs from tc_cls into xdp. I think it makes sense to have both bpf_tx_allports(), bpf_tx_port(), and bpf_redirect(). .John