On 4/28/17 12:43 PM, Hannes Frederic Sowa wrote:
On 28.04.2017 07:30, Alexei Starovoitov wrote:On 4/27/17 10:06 PM, John Fastabend wrote:That is more or less what I was thinking as well. The other question I have though is should we have a bpf_redirect() call for the simple case where I use the ifindex directly. This will be helpful for taking existing programs from tc_cls into xdp. I think it makes sense to have both bpf_tx_allports(), bpf_tx_port(), and bpf_redirect().I think so too. Once netdevice is stored into netdev_array map the netdevice is pinned and we need to figure out what to do if somebody tries to delete it. Should we add a new netlink notifier that this netdev's refcnt is almost zero and it's only in netdev_array(s) ?We basically do that automatically in netdev_wait_allrefs: pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n", dev->name, refcnt); It is a very unpleasant warning and users probably think about a bug in the kernel at first. I don't think we should wait for user space to clean that up but have to do it automatically from the kernel. Maybe we can introduce a special value that basically NOPs the transmission. The hash table itself would install a netdevice notifier and would clean all tables. Could definitely cause some storm in the kernel, if a lot of keys are mapped to the same interface.or should it be deleted from the array(s) automatically and then user space will be notified post-deletion? Both approaches have their pros and cons.I am leaning more towards deleting it automatically. But walking all tables and in there all keys might cause some unwanted load spikes.
yeah, when netdev is being unregistered we have to drop the ref to avoid the warning. Speaking of delete notifiers for maps. Until recently all deletions were explicit and the program could do extra perf_event_submit() if it needed to notify user space of deletion. But right now we have LRU map that deletes automatically and this netdev_array will be deleting automatically too, so we probably need some sort of generic map notifier for all map types. It can be as simple as user space sleeping on read(map_fd, buf); or waiting for epoll on map_fd and kernel wakes it up when map element is deleted and pushes 'key/value' pair into the buf. That should be generic for all map types, but it needs some mechanism to avoid blocking, since number of deletions can be huge. Something similar to perf's event record 'lost N records' should do the trick.