Re: [PATCH bpf-next v2 1/8] xdp: restructure redirect actions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Björn Töpel <bjorn.topel@xxxxxxxxx> writes:

> On 2021-01-20 13:44, Toke Høiland-Jørgensen wrote:
>> Björn Töpel <bjorn.topel@xxxxxxxxx> writes:
>> 
>>> From: Björn Töpel <bjorn.topel@xxxxxxxxx>
>>>
>>> The XDP_REDIRECT implementations for maps and non-maps are fairly
>>> similar, but obviously need to take different code paths depending on
>>> if the target is using a map or not. Today, the redirect targets for
>>> XDP either uses a map, or is based on ifindex.
>>>
>>> Future commits will introduce yet another redirect target via the a
>>> new helper, bpf_redirect_xsk(). To pave the way for that, we introduce
>>> an explicit redirect type to bpf_redirect_info. This makes the code
>>> easier to follow, and makes it easier to add new redirect targets.
>>>
>>> Further, using an explicit type in bpf_redirect_info has a slight
>>> positive performance impact by avoiding a pointer indirection for the
>>> map type lookup, and instead use the hot cacheline for
>>> bpf_redirect_info.
>>>
>>> The bpf_redirect_info flags member is not used by XDP, and not
>>> read/written any more. The map member is only written to when
>>> required/used, and not unconditionally.
>> 
>> I like the simplification. However, the handling of map clearing becomes
>> a bit murky with this change:
>> 
>> You're not changing anything in bpf_clear_redirect_map(), and you're
>> removing most of the reads and writes of ri->map. Instead,
>> bpf_xdp_redirect_map() will store the bpf_dtab_netdev pointer in
>> ri->tgt_value, which xdp_do_redirect() will just read and use without
>> checking. But if the map element (or the entire map) has been freed in
>> the meantime that will be a dangling pointer. I *think* the RCU callback
>> in dev_map_delete_elem() and the rcu_barrier() in dev_map_free()
>> protects against this, but that is by no means obvious. So confirming
>> this, and explaining it in a comment would be good.
>>
>
> Yes, *most* of the READ_ONCE(ri->map) are removed, it's pretty much only 
> the bpf_redirect_map(), and as you write, the tracepoints.
>
> The content/element of the map is RCU protected, and actually even the
> map will be around until the XDP processing is complete. Note the
> synchronize_rcu() followed after all bpf_clear_redirect_map() calls.
>
> I'll try to make it clearer in the commit message! Thanks for pointing 
> that out!
>
>> Also, as far as I can tell after this, ri->map is only used for the
>> tracepoint. So how about just storing the map ID and getting rid of the
>> READ/WRITE_ONCE() entirely?
>>
>
> ...and the bpf_redirect_map() helper. Don't you think the current
> READ_ONCE(ri->map) scheme is more obvious/clear?

Yeah, after your patch we WRITE_ONCE() the pointer in
bpf_redirect_map(), but the only place it is actually *read* is in the
tracepoint. So the only purpose of bpf_clear_redirect_map() is to ensure
that an invalid pointer is not read in the tracepoint function. Which
seems a bit excessive when we could just store the map ID for direct use
in the tracepoint and get rid of bpf_clear_redirect_map() entirely, no?

Besides, from a UX point of view, having the tracepoint display the map
ID even if that map ID is no longer valid seems to me like it makes more
sense than just displaying a map ID of 0 and leaving it up to the user
to figure out that this is because the map was cleared. I mean, at the
time the redirect was made, that *was* the map ID that was used...

Oh, and as you say due to the synchronize_rcu() call in dev_map_free() I
think this whole discussion is superfluous anyway, since it can't
actually happen that the map gets freed between the setting and reading
of ri->map, no?

>> (Oh, and related to this I think this patch set will conflict with
>> Hangbin's multi-redirect series, so maybe you two ought to coordinate? :))
>>
>
> Yeah, good idea! I would guess Hangbin's would go in before this, so I
> would need to adapt.
>
>
> Thanks for taking of look at the series, Toke! Much appreciated!

You're welcome :)

-Toke





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux