On Sat, Apr 24, 2021 at 09:01:29AM +0200, Jesper Dangaard Brouer wrote: > > > > >> @@ -3942,7 +3960,12 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp, > > > > >> case BPF_MAP_TYPE_DEVMAP: > > > > >> fallthrough; > > > > >> case BPF_MAP_TYPE_DEVMAP_HASH: > > > > >> - err = dev_map_enqueue(fwd, xdp, dev); > > > > >> + map = xchg(&ri->map, NULL); > > > > > > > > > > Hmm, this looks dangerous for performance to have on this fast-path. > > > > > The xchg call can be expensive, AFAIK this is an atomic operation. > > > > > > > > Ugh, you're right. That's my bad, I suggested replacing the > > > > READ_ONCE()/WRITE_ONCE() pair with the xchg() because an exchange is > > > > what it's doing, but I failed to consider the performance implications > > > > of the atomic operation. Sorry about that, Hangbin! I guess this should > > > > be changed to: > > > > > > > > + map = READ_ONCE(ri->map); > > > > + if (map) { > > > > + WRITE_ONCE(ri->map, NULL); > > > > + err = dev_map_enqueue_multi(xdp, dev, map, > > > > + ri->flags & BPF_F_EXCLUDE_INGRESS); > > > > + } else { > > > > + err = dev_map_enqueue(fwd, xdp, dev); > > > > + } > > > > > > This is highly sensitive fast-path code, as you saw Bjørn have been > > > hunting nanosec in this area. The above code implicitly have "map" as > > > the likely option, which I don't think it is. > > > > Hi Jesper, > > > > From the performance data, there is only a slightly impact. Do we still need > > to block the whole patch on this? Or if you have a better solution? > > I'm basically just asking you to add an unlikely() annotation: > > map = READ_ONCE(ri->map); > if (unlikely(map)) { > WRITE_ONCE(ri->map, NULL); > err = dev_map_enqueue_multi(xdp, dev, map, [...] > > For XDP, performance is the single most important factor! You say your > performance data, there is only a slightly impact, there must be ZERO > impact (when your added features is not in use). > > You data: > Version | Test | Generic | Native > 5.12 rc4 | redirect_map i40e->i40e | 1.9M | 9.6M > 5.12 rc4 + patch | redirect_map i40e->i40e | 1.9M | 9.3M > > The performance difference 9.6M -> 9.3M is a slowdown of 3.36 nanosec. > Bjørn and others have been working really hard to optimize the code and > remove down to 1.5 nanosec overheads. Thus, introducing 3.36 nanosec > added overhead to the fast-path is significant. I re-check the performance data. The data > Version | Test | Generic | Native > 5.12 rc4 | redirect_map i40e->i40e | 1.9M | 9.6M > 5.12 rc4 + patch | redirect_map i40e->i40e | 1.9M | 9.3M is done on version 5. Today I re-did the test, on version 10, with xchg() changed to READ_ONCE/WRITE_ONCE. Here is the new data (Generic path data was omitted as there is no change) Version | Test | Generic | Native 5.12 rc4 | redirect_map i40e->i40e | 9.7M 5.12 rc4 | redirect_map i40e->veth | 11.8M 5.12 rc4 + patch | redirect_map i40e->i40e | 9.6M 5.12 rc4 + patch | redirect_map i40e->veth | 11.6M 5.12 rc4 + patch | redirect_map multi i40e->i40e | 9.5M 5.12 rc4 + patch | redirect_map multi i40e->veth | 11.5M 5.12 rc4 + patch | redirect_map multi i40e->mlx4+veth | 3.9M And after add unlikely() in the check path, the new data looks like Version | Test | Native 5.12 rc4 + patch | redirect_map i40e->i40e | 9.6M 5.12 rc4 + patch | redirect_map i40e->veth | 11.7M 5.12 rc4 + patch | redirect_map multi i40e->i40e | 9.4M 5.12 rc4 + patch | redirect_map multi i40e->veth | 11.4M 5.12 rc4 + patch | redirect_map multi i40e->mlx4+veth | 3.8M So with unlikely(), the redirect_map is a slightly up, while redirect_map broadcast has a little drawback. But for the total data it looks this time there is no much gap compared with no this patch for redirect_map. Do you think we still need the unlikely() in check path? Thanks Hangbin