On 2019/06/07 18:32, Jesper Dangaard Brouer wrote:
On Fri, 7 Jun 2019 11:22:00 +0900 Toshiaki Makita <toshiaki.makita1@xxxxxxxxx> wrote:On 2019/06/07 4:41, Jesper Dangaard Brouer wrote:On Thu, 6 Jun 2019 20:04:20 +0900 Toshiaki Makita <toshiaki.makita1@xxxxxxxxx> wrote:On 2019/06/05 16:59, Jesper Dangaard Brouer wrote:On Wed, 5 Jun 2019 14:36:12 +0900 Toshiaki Makita <toshiaki.makita1@xxxxxxxxx> wrote:[...]So... prog_id is the problem. The program can be changed while we are enqueueing packets to the bulk queue, so the prog_id at flush may be an unexpected one.Hmmm... that sounds problematic, if the XDP bpf_prog for veth can change underneath, before the flush. Our redirect system, depend on things being stable until the xdp_do_flush_map() operation, as will e.g. set per-CPU (bpf_redirect_info) map_to_flush pointer (which depend on XDP prog), and expect it to be correct/valid.Sorry, I don't get how maps depend on programs.BPF/XDP programs have a reference count on the map (e.g. used for redirect) and when the XDP is removed, and last refcnt for the map is reached, then the map is also removed (redirect maps does a call_rcu when shutdown).
Thanks, now I understand what you mean.
I guess this could actually happen, but we are "saved" by theAt least xdp_do_redirect_map() handles map_to_flush change during NAPI. Is there a problem when the map is not changed but the program is changed? Also I believe this is not veth-specific behavior. Looking at tun and i40e, they seem to change xdp_prog without stopping data path.'map_to_flush' (pointer) is still valid due to RCU protection. But it does look fishy, as our rcu_read_lock's does not encapsulation this. There is RCU-read-section in veth_xdp_rcv_skb(), which via can call xdp_do_redirect() which set per-CPU ri->map_to_flush. Do we get this protection by running under softirq, and does this prevent an RCU grace-period (call_rcu callbacks) from happening? (between veth_xdp_rcv_skb() and xdp_do_flush_map() in veth_poll())
We are trying to avoid the problem in dev_map_free()? /* To ensure all pending flush operations have completed wait for flush * bitmap to indicate all flush_needed bits to be zero on _all_ cpus. * Because the above synchronize_rcu() ensures the map is disconnected * from the program we can assume no new bits will be set. */ for_each_online_cpu(cpu) { unsigned long *bitmap = per_cpu_ptr(dtab->flush_needed, cpu); while (!bitmap_empty(bitmap, dtab->map.max_entries)) cond_resched(); } Not sure if this is working as expected.
To Toshiaki, regarding your patch 2/2, you are not affected by this per-CPU map storing, as you pass along the bulk-queue. I do see you point, with prog_id could change. Could you change the tracepoint to include the 'act' and place 'ifindex' above this in the struct, this way the 'act' member is in the same location/offset as other XDP tracepoints. I see the 'ifindex' as the identifier for this tracepoint (other have map_id or prog_id in this location).
Sure, thanks. Toshiaki Makita