On Wed, Jan 8, 2020 at 1:14 PM John Fastabend <john.fastabend@xxxxxxxxx> wrote: > > The sock_map_free() and sock_hash_free() paths used to delete sockmap > and sockhash maps walk the maps and destroy psock and bpf state associated > with the socks in the map. When done the socks no longer have BPF programs > attached and will function normally. This can happen while the socks in > the map are still "live" meaning data may be sent/received during the walk. > > Currently, though we don't take the sock_lock when the psock and bpf state > is removed through this path. Specifically, this means we can be writing > into the ops structure pointers such as sendmsg, sendpage, recvmsg, etc. > while they are also being called from the networking side. This is not > safe, we never used proper READ_ONCE/WRITE_ONCE semantics here if we > believed it was safe. Further its not clear to me its even a good idea > to try and do this on "live" sockets while networking side might also > be using the socket. Instead of trying to reason about using the socks > from both sides lets realize that every use case I'm aware of rarely > deletes maps, in fact kubernetes/Cilium case builds map at init and > never tears it down except on errors. So lets do the simple fix and > grab sock lock. > > This patch wraps sock deletes from maps in sock lock and adds some > annotations so we catch any other cases easier. > > Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") > Signed-off-by: John Fastabend <john.fastabend@xxxxxxxxx> Acked-by: Song Liu <songliubraving@xxxxxx>