On Wed, Feb 16, 2022 at 04:33:41AM +0000, Matthew Wilcox wrote: > On Wed, Feb 16, 2022 at 01:17:03PM +0900, Byungchul Park wrote: > > [ 7.013330] =================================================== > > [ 7.013331] DEPT: Circular dependency has been detected. > > [ 7.013332] 5.17.0-rc1-00014-gcf3441bb2012 #2 Tainted: G W > > [ 7.013333] --------------------------------------------------- > > [ 7.013334] summary > > [ 7.013334] --------------------------------------------------- > > [ 7.013335] *** DEADLOCK *** > > [ 7.013335] > > [ 7.013335] context A > > [ 7.013336] [S] (unknown)(&(&ei->socket.wq.wait)->dmap:0) > > [ 7.013337] [W] __mutex_lock_common(&u->iolock:0) > > [ 7.013338] [E] event(&(&ei->socket.wq.wait)->dmap:0) > > [ 7.013340] > > [ 7.013340] context B > > [ 7.013341] [S] __raw_spin_lock(&u->lock:0) > > [ 7.013342] [W] wait(&(&ei->socket.wq.wait)->dmap:0) > > [ 7.013343] [E] spin_unlock(&u->lock:0) > > This seems unlikely to be real. We're surely not actually waiting > while holding a spinlock; existing debug checks would catch it. > > > [ 7.013407] --------------------------------------------------- > > [ 7.013407] context B's detail > > [ 7.013408] --------------------------------------------------- > > [ 7.013408] context B > > [ 7.013409] [S] __raw_spin_lock(&u->lock:0) > > [ 7.013410] [W] wait(&(&ei->socket.wq.wait)->dmap:0) > > [ 7.013411] [E] spin_unlock(&u->lock:0) > > [ 7.013412] > > [ 7.013412] [S] __raw_spin_lock(&u->lock:0): > > [ 7.013413] [<ffffffff81aa451f>] unix_stream_read_generic+0x6bf/0xb60 > > [ 7.013416] stacktrace: > > [ 7.013416] _raw_spin_lock+0x6e/0x90 > > [ 7.013418] unix_stream_read_generic+0x6bf/0xb60 > > It would be helpful if you'd run this through scripts/decode_stacktrace.sh (Sorry for late reply, which was because of my email client issue.) It was big help. Thank you very much. > so we could see line numbers instead of hex offsets (which arene't much > use without the binary kernel). > > > [ 7.013420] unix_stream_recvmsg+0x40/0x50 > > [ 7.013422] sock_read_iter+0x85/0xd0 > > [ 7.013424] new_sync_read+0x162/0x180 > > [ 7.013426] vfs_read+0xf3/0x190 > > [ 7.013428] ksys_read+0xa6/0xc0 > > [ 7.013429] do_syscall_64+0x3a/0x90 > > [ 7.013431] entry_SYSCALL_64_after_hwframe+0x44/0xae > > [ 7.013433] > > [ 7.013434] [W] wait(&(&ei->socket.wq.wait)->dmap:0): > > [ 7.013434] [<ffffffff810bb017>] prepare_to_wait+0x47/0xd0 > > ... this may be the source of confusion. Just because we prepare to > wait doesn't mean we end up actually waiting. For example, look at > unix_wait_for_peer(): > > prepare_to_wait_exclusive(&u->peer_wait, &wait, TASK_INTERRUPTIBLE); > > sched = !sock_flag(other, SOCK_DEAD) && > !(other->sk_shutdown & RCV_SHUTDOWN) && > unix_recvq_full(other); > > unix_state_unlock(other); > > if (sched) > timeo = schedule_timeout(timeo); > > finish_wait(&u->peer_wait, &wait); > > We *prepare* to wait, *then* drop the lock, then actually schedule. Big help, too. I checked some samples for the usage, but where it's almost "prepare == wait" :-(. Thanks a lot! Thanks, Byungchul