Hangbin Liu <liuhangbin@xxxxxxxxx> writes: > On Wed, Mar 31, 2021 at 03:41:17PM +0200, Toke Høiland-Jørgensen wrote: >> > @@ -1491,13 +1492,20 @@ static __always_inline int __bpf_xdp_redirect_map(struct bpf_map *map, u32 ifind >> > */ >> > ri->map_id = INT_MAX; /* Valid map id idr range: [1,INT_MAX[ */ >> > ri->map_type = BPF_MAP_TYPE_UNSPEC; >> > - return flags; >> > + return flags & BPF_F_ACTION_MASK; >> > } >> > >> > ri->tgt_index = ifindex; >> > ri->map_id = map->id; >> > ri->map_type = map->map_type; >> > >> > + if ((map->map_type == BPF_MAP_TYPE_DEVMAP || >> > + map->map_type == BPF_MAP_TYPE_DEVMAP_HASH) && >> > + (flags & BPF_F_BROADCAST)) { >> > + ri->flags = flags; >> >> This, combined with this: >> >> [...] >> >> > + if (ri->flags & BPF_F_BROADCAST) { >> > + map = READ_ONCE(ri->map); >> > + WRITE_ONCE(ri->map, NULL); >> > + } >> > + >> > switch (map_type) { >> > case BPF_MAP_TYPE_DEVMAP: >> > fallthrough; >> > case BPF_MAP_TYPE_DEVMAP_HASH: >> > - err = dev_map_enqueue(fwd, xdp, dev); >> > + if (ri->flags & BPF_F_BROADCAST) >> > + err = dev_map_enqueue_multi(xdp, dev, map, >> > + ri->flags & BPF_F_EXCLUDE_INGRESS); >> > + else >> > + err = dev_map_enqueue(fwd, xdp, dev); >> >> Means that (since the flags value is never cleared again) once someone >> has done a broadcast redirect, that's all they'll ever get until the >> next reboot ;) > > How about just get the ri->flags first and clean it directly. e.g. > > flags = ri->flags; > ri->flags = 0; That would fix the "until next reboot" issue, but would still render bpf_clear_redirect_map() useless. So you still need to check ri->map and if you do that you don't actually need to clear the flag field as long as it is set correctly whenever the map pointer is. > With this we don't need to add an extra field ri->exclude_ingress as you > mentioned later. The ri->exclude_ingress would be *instead* of the flags field. You could of course also just keep the flags field, but turning it into a bool makes it obvious that only one of the bits is actually used (and thus easier to see that it's correct to not clear it). > People may also need the flags field in future. In which case they can add it back at that time :) >> Also, the bpf_clear_redirect_map() call has no effect since the call to >> dev_map_enqueue_multi() only checks the flags and not the value of the >> map pointer before deciding which enqueue function to call. >> >> To fix both of these, how about changing the logic so that: >> >> - __bpf_xdp_redirect_map() sets the map pointer if the broadcast flag is >> set (and clears it if the flag isn't set!) > > OK >> >> - xdp_do_redirect() does the READ_ONCE/WRITE_ONCE on ri->map >> unconditionally and then dispatches to dev_map_enqueue_multi() if the >> read resulted in a non-NULL pointer >> >> Also, it should be invalid to set the broadcast flag with a map type >> other than a devmap; __bpf_xdp_redirect_map() should check that. > > The current code only do if (unlikely(flags > XDP_TX)) and didn't check the > map type. I also only set the map when there has devmap + broadcast flag. > Do you mean we need a more strict check? like > > if (unlikely((flags & ~(BPF_F_ACTION_MASK | BPF_F_REDIR_MASK)) || > (map->map_type != BPF_MAP_TYPE_DEVMAP && > map->map_type != BPF_MAP_TYPE_DEVMAP_HASH && > flags & BPF_F_REDIR_MASK))) Yeah, that's what I meant, but when you type it out that does seem like a bit too many checks. However, I think we can do something different: Since Björn has helpfully split out the helper functions for the different map types, we can add another argument to __bpf_xdp_redirect_map() which is the mask of valid flag values. With this, dev_{hash_,}map_redirect() can include BPF_F_REDIR_MASK in the valid flags, and {xsk,cpu}_map_redirect() can leave them out. That makes the code do the right thing without actually adding any more checks in the fast path :) (You'd still need to AND the return value with BPF_F_ACTION_MASK when returning, of course). -Toke