Björn Töpel <bjorn.topel@xxxxxxxxx> writes: > On 2021-01-20 15:52, Toke Høiland-Jørgensen wrote: >> Björn Töpel <bjorn.topel@xxxxxxxxx> writes: >> >>> On 2021-01-20 13:44, Toke Høiland-Jørgensen wrote: >>>> Björn Töpel <bjorn.topel@xxxxxxxxx> writes: >>>> >>>>> From: Björn Töpel <bjorn.topel@xxxxxxxxx> >>>>> >>>>> The XDP_REDIRECT implementations for maps and non-maps are fairly >>>>> similar, but obviously need to take different code paths depending on >>>>> if the target is using a map or not. Today, the redirect targets for >>>>> XDP either uses a map, or is based on ifindex. >>>>> >>>>> Future commits will introduce yet another redirect target via the a >>>>> new helper, bpf_redirect_xsk(). To pave the way for that, we introduce >>>>> an explicit redirect type to bpf_redirect_info. This makes the code >>>>> easier to follow, and makes it easier to add new redirect targets. >>>>> >>>>> Further, using an explicit type in bpf_redirect_info has a slight >>>>> positive performance impact by avoiding a pointer indirection for the >>>>> map type lookup, and instead use the hot cacheline for >>>>> bpf_redirect_info. >>>>> >>>>> The bpf_redirect_info flags member is not used by XDP, and not >>>>> read/written any more. The map member is only written to when >>>>> required/used, and not unconditionally. >>>> >>>> I like the simplification. However, the handling of map clearing becomes >>>> a bit murky with this change: >>>> >>>> You're not changing anything in bpf_clear_redirect_map(), and you're >>>> removing most of the reads and writes of ri->map. Instead, >>>> bpf_xdp_redirect_map() will store the bpf_dtab_netdev pointer in >>>> ri->tgt_value, which xdp_do_redirect() will just read and use without >>>> checking. But if the map element (or the entire map) has been freed in >>>> the meantime that will be a dangling pointer. I *think* the RCU callback >>>> in dev_map_delete_elem() and the rcu_barrier() in dev_map_free() >>>> protects against this, but that is by no means obvious. So confirming >>>> this, and explaining it in a comment would be good. >>>> >>> >>> Yes, *most* of the READ_ONCE(ri->map) are removed, it's pretty much only >>> the bpf_redirect_map(), and as you write, the tracepoints. >>> >>> The content/element of the map is RCU protected, and actually even the >>> map will be around until the XDP processing is complete. Note the >>> synchronize_rcu() followed after all bpf_clear_redirect_map() calls. >>> >>> I'll try to make it clearer in the commit message! Thanks for pointing >>> that out! >>> >>>> Also, as far as I can tell after this, ri->map is only used for the >>>> tracepoint. So how about just storing the map ID and getting rid of the >>>> READ/WRITE_ONCE() entirely? >>>> >>> >>> ...and the bpf_redirect_map() helper. Don't you think the current >>> READ_ONCE(ri->map) scheme is more obvious/clear? >> >> Yeah, after your patch we WRITE_ONCE() the pointer in >> bpf_redirect_map(), but the only place it is actually *read* is in the >> tracepoint. So the only purpose of bpf_clear_redirect_map() is to ensure >> that an invalid pointer is not read in the tracepoint function. Which >> seems a bit excessive when we could just store the map ID for direct use >> in the tracepoint and get rid of bpf_clear_redirect_map() entirely, no? >> >> Besides, from a UX point of view, having the tracepoint display the map >> ID even if that map ID is no longer valid seems to me like it makes more >> sense than just displaying a map ID of 0 and leaving it up to the user >> to figure out that this is because the map was cleared. I mean, at the >> time the redirect was made, that *was* the map ID that was used... >> > > Convinced! Getting rid of bpf_clear_redirect_map() would be good! I'll > take a stab at this for v3! Cool! >> Oh, and as you say due to the synchronize_rcu() call in dev_map_free() I >> think this whole discussion is superfluous anyway, since it can't >> actually happen that the map gets freed between the setting and reading >> of ri->map, no? >> > > It can't be free'd but, ri->map can be cleared via > bpf_clear_redirect_map(). So, between the helper (setting) and the > tracepoint in xdp_do_redirect() it can be cleared (say if the XDP > program is swapped out, prior running xdp_do_redirect()). But xdp_do_redirect() should be called on driver flush before exiting the NAPI cycle, so how can the XDP program be swapped out? > Moving to the scheme you suggested, does make the discussion > superfluous. :-) Yup, awesome :) -Toke