On 4/27/17 1:41 AM, Jesper Dangaard Brouer wrote:
When registering/attaching a XDP/bpf program, we would just send the file-descriptor for this port-map along (like we do with the bpf_prog FD). Plus, it own ingress-port number this program is in the port-map. It is not clear to me, in-which-data-structure on the kernel-side we store this reference to the port-map and ingress-port. As today we only have the "raw" struct bpf_prog pointer. I see several options: 1. Create a new xdp_prog struct that contains existing bpf_prog, a port-map pointer and ingress-port. (IMHO easiest solution) 2. Just create a new pointer to port-map and store it in driver rx-ring struct (like existing bpf_prog), but this create a race-challenge replacing (cmpxchg) the program (or perhaps it's not a problem as it runs under rcu and RTNL-lock). 3. Extend bpf_prog to store this port-map and ingress-port, and have a fast-way to access it. I assume it will be accessible via bpf_prog->bpf_prog_aux->used_maps[X] but it will be too slow for XDP.
I'm not sure I completely follow the 3 proposals. Are you suggesting to have only one netdev_array per program? Why not to allow any number like we do for tailcall+prog_array, etc? We can teach verifier to allow new helper bpf_tx_port(netdev_array, port_num); to only be used with netdev_array map type. It will fetch netdevice pointer from netdev_array[port_num] and will tx the packet into it. We can make it similar to bpf_tail_call(), so that program will finish on successful bpf_tx_port() or make it into 'delayed' tx which will be executed when program finishes. Not sure which approach is better. We can also extend this netdev_array into broadcast/multicast. Like bpf_tx_allports(&netdev_array); call from the program will xmit the packet to all netdevices in that 'netdev_array' map type. The map-in-map support can be trivially extended to allow netdev_array, then the program can create N multicast groups of netdevices. Each multicast group == one netdev_array map. The user space will populate a hashmap with these netdev_arrays and bpf kernel side can select dynamically which multicast group to use to send the packets to. bpf kernel side may look like: struct bpf_netdev_array *netdev_array = bpf_map_lookup_elem(&hash, key); if (!netdev_array) ... if (my_condition) bpf_tx_allports(netdev_array); /* broadcast to all netdevices */ else bpf_tx_port(netdev_array, port_num); /* tx into one netdevice */ that's an artificial example. Just trying to point out that we shouldn't restrict the feature too soon.