пн, 21 окт. 2019 г. в 10:23, David Wysochanski <dwysocha@xxxxxxxxxx>: > > HI all, > > We recently found another problem when running a simple test case > where multiple processes open/read/close a single file, and a network > disruption or smb restart occurs. > > After just a few minutes of restarts, a few of the processes error > out, but a few remain. If you stop the disruption, the share is > accessible again, but the original processes are blocked indefinitely > waiting on lock_sem > > Once this occurs, you'll see two threads deadlocked like the below. Basically: > thread1: down_read(lock_sem) inside cifs_strict_readv(); does not > block, but blocks on network temporarily > thread2: down_write(lock_sem) inside _cifsFileInfo_put; blocks due to thread 1 > thread1: down_read(lock_sem) inside cifs_reopen_file (after > reconnect); unblocks to reconnect, but then blocks due to thread 2 > > At this point there is a deadlock, both threads are waiting for each > other to release lock_sem > > The root of the problem seems to be having to hold the read sem over a > network disruption, and no check when it comes back before down_read a > second time. > Inside cifs_strict_readv there's a comment that explains why we need > hold the sem over the read, and it's so no one modifies the lock list > with a brlock that prevents writing. > However, in this case at least, we're only opening files for reading, > and inside _cifsFileInfo_put it needs to down_write to delete the > outstanding lock records. > So we're not really hitting that case, but are modifying the list. > > One approach to avoid the deadlock is for the reading process to drop > the lock_sem before re-aquiring it again inside cifs_reopen_file. > We would drop the lock just before calling, and then inside we'd need > to check for the condition of the conflicting brlock, and then > possibly error out somehow. > But that is probably not preferred. > > Instead, can the reader only down_read one time? cifs_relock_file is > only called from one place (cifs_reopen_file) and it's when we're > reconnecting, but > cifs_reopen_file is called from many places. I'm not sure if we there > is something in existence we can use that says we don't need to > re-aquire it, or if we need to add a flag or something? > > > > > > Here is the live crash session > > crash> ps -m | grep UN > [0 01:54:10.166] [UN] PID: 13271 TASK: ffff93b8a7349e40 CPU: 2 > COMMAND: "python3-openloo" > [0 01:54:10.167] [UN] PID: 13273 TASK: ffff93b8b3310000 CPU: 2 > COMMAND: "python3-openloo" > [0 01:54:10.169] [UN] PID: 13269 TASK: ffff93b8a734dac0 CPU: 0 > COMMAND: "python3-openloo" > [0 01:54:10.208] [UN] PID: 13272 TASK: ffff93b8a7348000 CPU: 7 > COMMAND: "python3-openloo" > [0 01:54:10.216] [UN] PID: 13265 TASK: ffff93b8a7358000 CPU: 6 > COMMAND: "python3-openloo" > [0 01:54:10.172] [UN] PID: 13263 TASK: ffff93b8b3313c80 CPU: 2 > COMMAND: "python3-openloo" > [0 01:54:10.229] [UN] PID: 13267 TASK: ffff93b8a7359e40 CPU: 5 > COMMAND: "python3-openloo" > [0 01:54:10.234] [UN] PID: 13277 TASK: ffff93b8a7dddac0 CPU: 3 > COMMAND: "python3-openloo" > [0 01:54:10.235] [UN] PID: 13270 TASK: ffff93b8a734bc80 CPU: 1 > COMMAND: "python3-openloo" > [0 01:54:10.244] [UN] PID: 13276 TASK: ffff93b8a7dd8000 CPU: 6 > COMMAND: "python3-openloo" > [0 01:54:10.260] [UN] PID: 13264 TASK: ffff93b8b3315ac0 CPU: 6 > COMMAND: "python3-openloo" > [0 01:54:10.257] [UN] PID: 13274 TASK: ffff93b8a7ddbc80 CPU: 1 > COMMAND: "python3-openloo" > [0 01:54:31.262] [UN] PID: 13279 TASK: ffff93b8a7d90000 CPU: 1 > COMMAND: "python3-openloo" <--- block same instant > [0 01:54:31.265] [UN] PID: 13275 TASK: ffff93b8a7dd9e40 CPU: 4 > COMMAND: "python3-openloo" <--- block same instant > crash> bt 13279 > PID: 13279 TASK: ffff93b8a7d90000 CPU: 1 COMMAND: "python3-openloo" > #0 [ffffaae1c0a77c20] __schedule at ffffffffa19d4239 > #1 [ffffaae1c0a77cb0] schedule at ffffffffa19d46b9 > #2 [ffffaae1c0a77cc8] rwsem_down_write_slowpath at ffffffffa1131ada > <------------- T+1: pid 13279 down_write(... lock_sem); blocks due to > pid 13275 > #3 [ffffaae1c0a77d98] _cifsFileInfo_put at ffffffffc0759818 [cifs] > #4 [ffffaae1c0a77e78] cifs_close at ffffffffc0759b4f [cifs] > #5 [ffffaae1c0a77e88] __fput at ffffffffa12f6e51 > #6 [ffffaae1c0a77ed0] task_work_run at ffffffffa10fc827 > #7 [ffffaae1c0a77f08] exit_to_usermode_loop at ffffffffa1003f83 > #8 [ffffaae1c0a77f38] do_syscall_64 at ffffffffa10043c2 > #9 [ffffaae1c0a77f50] entry_SYSCALL_64_after_hwframe at ffffffffa1a0008c > RIP: 00007fc6caca1778 RSP: 00007ffed0c43de8 RFLAGS: 00000246 > RAX: 0000000000000000 RBX: 00007fc6bd5a5f30 RCX: 00007fc6caca1778 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 > RBP: 0000558ebb9ca0d0 R8: 00007fc6cab941a8 R9: 0000000000000000 > R10: 0000558ebb9ca0d0 R11: 0000000000000246 R12: 0000000000000000 > R13: 0000000000000000 R14: 00007fc6cab4df0c R15: 00007ffed0c43f10 > ORIG_RAX: 0000000000000003 CS: 0033 SS: 002b > crash> bt 13275 > PID: 13275 TASK: ffff93b8a7dd9e40 CPU: 4 COMMAND: "python3-openloo" > #0 [ffffaae1c0a57a00] __schedule at ffffffffa19d4239 > #1 [ffffaae1c0a57a90] schedule at ffffffffa19d46b9 > #2 [ffffaae1c0a57aa8] rwsem_down_read_slowpath at ffffffffa19d72b4 > #3 [ffffaae1c0a57b60] cifs_reopen_file at ffffffffc0755352 [cifs] > <------------- T+2: pid 13275 down_read(... lock_sem); blocks due to > pid 13279 > #4 [ffffaae1c0a57c08] cifs_readpage_worker at ffffffffc07575f0 [cifs] > #5 [ffffaae1c0a57cd0] cifs_readpage at ffffffffc0758096 [cifs] > #6 [ffffaae1c0a57d08] generic_file_read_iter at ffffffffa123acd2 > #7 [ffffaae1c0a57e10] cifs_strict_readv at ffffffffc07606a5 [cifs] > <------------- T+0: pid 13275 down_read(... lock_sem); does not block > #8 [ffffaae1c0a57e40] new_sync_read at ffffffffa12f336a > #9 [ffffaae1c0a57ec8] vfs_read at ffffffffa12f5fad > #10 [ffffaae1c0a57f00] ksys_read at ffffffffa12f62df > #11 [ffffaae1c0a57f38] do_syscall_64 at ffffffffa10042bb > #12 [ffffaae1c0a57f50] entry_SYSCALL_64_after_hwframe at ffffffffa1a0008c > RIP: 00007f6ed0b7c055 RSP: 00007ffeea637e68 RFLAGS: 00000246 > RAX: ffffffffffffffda RBX: 00007f6ed05cd600 RCX: 00007f6ed0b7c055 > RDX: 000000000000001e RSI: 00007f6ec3404c10 RDI: 0000000000000003 > RBP: 000000000000001e R8: 00007f6ed0a6f1a8 R9: 0000000000000000 > R10: 00007f6ed090faf0 R11: 0000000000000246 R12: 00007f6ec3480f40 > R13: 0000000000000003 R14: 00007f6ec3404c10 R15: 000055c449ea60d0 > ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b > > > Here is my tracing from the attached debug patch which shows the > conditions, and points at the bugzilla we have open: > [ 233.588248] CIFS VFS: file=fs/cifs/file.c line=480 pid=13279 > cifs_sem_op op=up_write sem=ffff93b8a7424bd8 > [ 233.589462] CIFS VFS: file=fs/cifs/file.c line=311 pid=13279 > cifs_sem_op op=down_write sem=ffff93b8a7424bd8 > [ 233.592333] CIFS VFS: file=fs/cifs/file.c line=314 pid=13275 > cifs_sem_op op=up_write sem=ffff93b8a7424bd8 > [ 233.612324] CIFS VFS: file=fs/cifs/file.c line=314 pid=13279 > cifs_sem_op op=up_write sem=ffff93b8a7424bd8 > [ 233.612510] CIFS VFS: file=fs/cifs/file.c line=3920 pid=13275 > cifs_sem_op op=down_read sem=ffff93b8a7424bd8 > [ 233.616487] CIFS VFS: file=fs/cifs/file.c line=3920 pid=13279 > cifs_sem_op op=down_read sem=ffff93b8a7424bd8 T0: > https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9 > [ 254.389458] CIFS VFS: Send error in read = -9 > [ 254.391943] CIFS VFS: file=fs/cifs/file.c line=3926 pid=13279 > cifs_sem_op op=up_read sem=ffff93b8a7424bd8 > [ 254.397453] CIFS VFS: file=fs/cifs/file.c line=471 pid=13279 > cifs_sem_op op=down_write sem=ffff93b8a7424bd8 T1: > https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9 > [ 254.399798] CIFS VFS: file=fs/cifs/file.c line=630 pid=13275 > cifs_sem_op op=down_read sem=ffff93b8a7424bd8 T2: > https://bugzilla.redhat.com/show_bug.cgi?id=1760835#c9 > > > > There was a patch that removed a lockdep warning, but it looks like > this removed a semi-valid warning, because clearly we can deadlock > here: > > commit 560d388950ceda5e7c7cdef7f3d9a8ff297bbf9d > Author: Rabin Vincent <rabinv@xxxxxxxx> > Date: Wed May 3 17:17:21 2017 +0200 > > CIFS: silence lockdep splat in cifs_relock_file() > > cifs_relock_file() can perform a down_write() on the inode's lock_sem even > though it was already performed in cifs_strict_readv(). Lockdep complains > about this. AFAICS, there is no problem here, and lockdep just needs to be > told that this nesting is OK. > > ============================================= > [ INFO: possible recursive locking detected ] > 4.11.0+ #20 Not tainted > --------------------------------------------- > cat/701 is trying to acquire lock: > (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00 > > but task is already holding lock: > (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&cifsi->lock_sem); > lock(&cifsi->lock_sem); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 1 lock held by cat/701: > #0: (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310 > > stack backtrace: > CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20 > Call Trace: > dump_stack+0x85/0xc2 > __lock_acquire+0x17dd/0x2260 > ? trace_hardirqs_on_thunk+0x1a/0x1c > ? preempt_schedule_irq+0x6b/0x80 > lock_acquire+0xcc/0x260 > ? lock_acquire+0xcc/0x260 > ? cifs_reopen_file+0x7a7/0xc00 > down_read+0x2d/0x70 > ? cifs_reopen_file+0x7a7/0xc00 > cifs_reopen_file+0x7a7/0xc00 > ? printk+0x43/0x4b > cifs_readpage_worker+0x327/0x8a0 > cifs_readpage+0x8c/0x2a0 > generic_file_read_iter+0x692/0xd00 > cifs_strict_readv+0x29f/0x310 > generic_file_splice_read+0x11c/0x1c0 > do_splice_to+0xa5/0xc0 > splice_direct_to_actor+0xfa/0x350 > ? generic_pipe_buf_nosteal+0x10/0x10 > do_splice_direct+0xb5/0xe0 > do_sendfile+0x278/0x3a0 > SyS_sendfile64+0xc4/0xe0 > entry_SYSCALL_64_fastpath+0x1f/0xbe > > Signed-off-by: Rabin Vincent <rabinv@xxxxxxxx> > Acked-by: Pavel Shilovsky <pshilov@xxxxxxxxxxxxx> > Signed-off-by: Steve French <smfrench@xxxxxxxxx> So, even after we already grabbed the semaphore it doesn't allow us to acquire it one more time because of the conflicting waiting down_write ? If yes, then we can create another work queue that will be re-locking files that have been re-opened. In this case cifs_reopen_file will simply submit the "re-lock" request to the queue without the need to grab the lock one more time. Thoughts? -- Best regards, Pavel Shilovsky