Hi Stan, On Wed, Oct 12, 2022 at 02:08 PM -07, sdf@xxxxxxxxxx wrote: > Hi John & Jakub, > > Upstream commit c0feea594e05 ("workqueue: don't skip lockdep work > dependency in cancel_work_sync()") seems to trigger the following > lockdep warning during test_prog's sockmap_listen: > > [ +0.003631] WARNING: possible circular locking dependency detected [...] > Are you ware? Any idea what's wrong? > Is there some stable fix I'm missing in bpf-next? Thanks for bringing it up. I didn't know. The mentioned commit doesn't look that fresh commit c0feea594e058223973db94c1c32a830c9807c86 Author: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> Date: Fri Jul 29 13:30:23 2022 +0900 workqueue: don't skip lockdep work dependency in cancel_work_sync() ... but then it just landed not so long ago, which explains things: $ git describe --contains c0feea594e058223973db94c1c32a830c9807c86 --match 'v*' v6.0-rc7~10^2 I've untangled the call chains leading to the potential dead-lock a bit. There does seem to be a window of opportunity there. psock->work.func = sk_psock_backlog() ACQUIRE psock->work_mutex sk_psock_handle_skb() skb_send_sock() __skb_send_sock() sendpage_unlocked() kernel_sendpage() sock->ops->sendpage = inet_sendpage() sk->sk_prot->sendpage = tcp_sendpage() ACQUIRE sk->sk_lock tcp_sendpage_locked() RELEASE sk->sk_lock RELEASE psock->work_mutex sock_map_close() ACQUIRE sk->sk_lock sk_psock_stop() sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED) cancel_work_sync() __cancel_work_timer() __flush_work() // wait for psock->work to finish RELEASE sk->sk_lock There is no fix I know of. Need to think. Ideas welcome. CC Cong, just FYI, because we did rearrange the locking scheme in [1]. However it looks to me like the dead-lock was already there before that. [1] https://lore.kernel.org/bpf/20210331023237.41094-5-xiyou.wangcong@xxxxxxxxx/