Vadim Goncharov <vadimnuclight@xxxxxxxxx> writes: >> > OK, no crash is half of good, thanks. But how'd I lock it from >> > deletion? A "concurrent updates" are usually about memory area that >> > still persist as a whole, but in my particular case it's a >> > bpf_timer callback who does bpf_map_delete_elem(). This is a >> > conntrack-like map - an entry contains some information about >> > connection, a struct bpf_timer and expire field when this entry >> > should be deleted due to inactivity, based on information in a >> > connection (thus *_LRU map types are not suitable for me). So it's >> > possible for a race condition here: >> > >> > 1. Thread 1 receives packet, does bpf_map_lookup_elem() and begins >> > processing, at end of which bpf_timer_start() will be called to >> > reschedule removal into future. >> > >> > 2. At moment after lookup, timer callback fires and does >> > bpf_map_delete_elem() while first thread is not yet finished. >> > >> > So how do I prevent connection record loss in such scenario? >> >> Yeah, there is no protection against this; you will have to do your >> own locking to prevent it. Something like putting a spinlock into >> your data structure and using that to synchronise access, maybe? > > Well, if I'll put bpf_spin_lock into map element, then it looks like > the following scenario is possible: > > 1. Thread 1 receives packet, does bpf_map_lookup_elem(map, key) > > 1a. Timer callback(map, same_key) does bpf_spin_lock on same_key > > 2. Thread 2 does bpf_spin_lock on same_key and waits. > > 2a. Timer callback sees lock successful, sets DELETED flag in entry > and bpf_map_delete_elem()'s it. > > 3. Thread 1 unlocked and either updates map entry directly by pointer, > or reinserts new entry reinitializing timer. Yeah, something like this. > Am I missing some race condition still present here? atomics or > something Unfortunately, I think the delete and update can still race. Deletion is based on the key only, there is no cmpxchg() semantics. So I don't see any reason why this sequence of events couldn't happen: - T2 sets the DELETED flag - T2 releases the lock, and then gets preempted - T1 grabs the lock, sees the deleted flag, allocates a new item and inserts it with update() - T2 wakes back up and does a bpf_map_delete() with the expired key, which then ends up deleting the new entry The only way I can think of to avoid this, is if the timer callback uses bpf_map_lookup_and_delete_item(), then rechecks the value and re-inserts it if the DELETED flag disappeared between operations. But this still leaves a window where the entry doesn't exist. Maybe someone else has a good idea for how to avoid this being racy? -Toke