Jakub Sitnicki wrote: > On Thu, Feb 06, 2020 at 06:51 AM CET, John Fastabend wrote: > > Jakub Sitnicki wrote: > >> On Sat, Jan 11, 2020 at 07:12 AM CET, John Fastabend wrote: > >> > The sock_map_free() and sock_hash_free() paths used to delete sockmap > >> > and sockhash maps walk the maps and destroy psock and bpf state associated > >> > with the socks in the map. When done the socks no longer have BPF programs > >> > attached and will function normally. This can happen while the socks in > >> > the map are still "live" meaning data may be sent/received during the walk. [...] > >> > >> John, I've noticed this is triggering warnings that we might sleep in > >> lock_sock while (1) in RCU read-side section, and (2) holding a spin > >> lock: > > [...] > > >> > >> Here's an idea how to change the locking. I'm still wrapping my head > >> around what protects what in sock_map_free, so please bear with me: > >> > >> 1. synchronize_rcu before we iterate over the array is not needed, > >> AFAICT. We are not free'ing the map just yet, hence any readers > >> accessing the map via the psock are not in danger of use-after-free. > > > > Agreed. When we added 2bb90e5cc90e ("bpf: sockmap, synchronize_rcu before > > free'ing map") we could have done this. > > > >> > >> 2. rcu_read_lock is needed to protect access to psock inside > >> sock_map_unref, but we can't sleep while in RCU read-side. So push > >> it down, after we grab the sock lock. > > > > yes this looks better. > > > >> > >> 3. Grabbing stab->lock seems not needed, either. We get called from > >> bpf_map_free_deferred, after map refcnt dropped to 0, so we're not > >> racing with any other map user to modify its contents. > > > > This I'll need to think on a bit. We have the link-lock there so > > probably should be safe to drop. But will need to trace this through > > git history to be sure. > > > > [...] > > >> WDYT? > > > > Can you push the fix to bpf but leave the stab->lock for now. I think > > we can do a slightly better cleanup on stab->lock in bpf-next. > > Here it is: > > https://lore.kernel.org/bpf/20200206111652.694507-1-jakub@xxxxxxxxxxxxxx/T/#t > > I left the "extra" synchronize_rcu before walking the map. On second > thought, this isn't a bug. Just adds extra wait. bpf-next material? Agree. > > > > >> > >> Reproducer follows. > > > > push reproducer into selftests? > > Included the reproducer with the fixes. If it gets dropped from the > series, I'll resubmit it once bpf-next reopens. Yeah, I don't have a strong preference where it lands I have a set of tests for bpf-next once it opens as well.