Re: XDP/eBPF map thread safety in kernel (e.g. lookup/delete)

Vadim Goncharov <vadimnuclight@xxxxxxxxx> · Fri, 25 Oct 2024 00:18:19 +0300

On Thu, 24 Oct 2024 16:49:37 +0200
Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:

> Vadim Goncharov <vadimnuclight@xxxxxxxxx> writes:
> 
> > On Wed, 23 Oct 2024 14:10:11 +0200
> > Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> >  
> >> Vadim Goncharov <vadimnuclight@xxxxxxxxx> writes:
> >>   
> >> > Hello,
> >> >
> >> > Where to find exact documentation about what happens in kernel
> >> > BPF helpers calls with respect to locking? For example, I have
> >> > `bpf_map_lookup_elem()` in one thread, then work on pointer, and
> >> > at this time, another thread does `bpf_map_delete_elem()` for
> >> > exactly same key. What happens to memory the first thread still
> >> > continue to work on? Is it now dangling pointer to nowhere?
> >> >
> >> > In my particular case it's a bpf_timer callback who does
> >> > `bpf_map_delete_elem()`. I'd prefer for it to not delete entry if
> >> > another thread did `lookup` and works already, is it possible to
> >> > do so (in a performant way)?    
> >> 
> >> Map elements are RCU protected,   
> >
> > I see this phrase everywhere, but always without details, whether
> > it's about just map structures itself (OK, this minimum is
> > guaranteed) or the user data, too.  
> 
> The user data in a map item is allocated together with the
> kernel-internal bits, so this goes for everything.

OK, got it.

> >> so you already get exactly the
> >> behaviour you're after: if another thread deletes a map element
> >> that you already looked up, the element is guaranteed to stick
> >> around in memory until the BPF program exits.
> >> 
> >> It won't be valid anymore *after* that of course, so if you're
> >> doing concurrent updates it's you own responsibility to sync
> >> appropriately. But there is no risk of the pointer suddenly being
> >> invalid in the middle of the program execution (so no crashes) :)  
> >
> > OK, no crash is half of good, thanks. But how'd I lock it from
> > deletion? A "concurrent updates" are usually about memory area that
> > still persist as a whole, but in my particular case it's a
> > bpf_timer callback who does bpf_map_delete_elem(). This is a
> > conntrack-like map - an entry contains some information about
> > connection, a struct bpf_timer and expire field when this entry
> > should be deleted due to inactivity, based on information in a
> > connection (thus *_LRU map types are not suitable for me). So it's
> > possible for a race condition here:
> >
> > 1.  Thread 1 receives packet, does bpf_map_lookup_elem() and begins
> >     processing, at end of which bpf_timer_start() will be called to
> >     reschedule removal into future.
> >
> > 2.  At moment after lookup, timer callback fires and does
> >     bpf_map_delete_elem() while first thread is not yet finished.
> >
> > So how do I prevent connection record loss in such scenario?  
> 
> Yeah, there is no protection against this; you will have to do your
> own locking to prevent it. Something like putting a spinlock into
> your data structure and using that to synchronise access, maybe?

Well, if I'll put bpf_spin_lock into map element, then it looks like
the following scenario is possible:

1. Thread 1 receives packet, does bpf_map_lookup_elem(map, key)

     1a. Timer callback(map, same_key) does bpf_spin_lock on same_key

2. Thread 2 does bpf_spin_lock on same_key and waits.

     2a. Timer callback sees lock successful, sets DELETED flag in entry
         and bpf_map_delete_elem()'s it.

3. Thread 1 unlocked and either updates map entry directly by pointer,
    or reinserts new entry reinitializing timer.

Am I missing some race condition still present here? atomics or something

-- 
WBR, @nuclight