2012/9/7 Jeff Layton <jlayton@xxxxxxxxxx>: > On Thu, 6 Sep 2012 18:34:45 +0400 > Pavel Shilovsky <piastryyy@xxxxxxxxx> wrote: > >> 2012/9/5 Jeff Layton <jlayton@xxxxxxxxxx>: >> > I got the following lockdep pop when testing a few patches on top of >> > Pavel's for-next branch. The relevant mount options were: >> > >> > vers=1.0,cache=strict,nounix >> > >> > The test that caused it to pop was connectathon test5 (the read/write >> > test). >> > >> > I haven't yet done any investigation but it looks like a plausible >> > deadlock possibility. Also, fwiw -- the test7 regression that I >> > reported earlier seems to have gone away for now. No idea why. >> > >> > Anyway, here's the lockdep spew: >> > >> > [ 1732.096609] ====================================================== >> > [ 1732.096609] [ INFO: possible circular locking dependency detected ] >> > [ 1732.096609] 3.6.0-0.rc4.git0.1.fc18.x86_64.debug #1 Tainted: G O >> > [ 1732.096609] ------------------------------------------------------- >> > [ 1732.096609] test5/7374 is trying to acquire lock: >> > [ 1732.096609] (sb_writers#13){.+.+.+}, at: [<ffffffff8115edd3>] generic_file_aio_write+0x63/0x100 >> > [ 1732.096609] >> > [ 1732.096609] but task is already holding lock: >> > [ 1732.096609] (&cifsi->lock_sem){+++++.}, at: [<ffffffffa02b61fc>] cifs_strict_writev+0xcc/0x1b0 [cifs] >> > [ 1732.096609] >> > [ 1732.096609] which lock already depends on the new lock. >> > [ 1732.096609] >> > [ 1732.096609] >> > [ 1732.096609] the existing dependency chain (in reverse order) is: >> > [ 1732.096609] >> > [ 1732.096609] -> #1 (&cifsi->lock_sem){+++++.}: >> > [ 1732.096609] [<ffffffff810d5d91>] lock_acquire+0xa1/0x1f0 >> > [ 1732.096609] [<ffffffff816d8721>] down_write+0x61/0xb0 >> > [ 1732.096609] [<ffffffffa02b1c8d>] cifs_new_fileinfo+0xbd/0x2e0 [cifs] >> > [ 1732.096609] [<ffffffffa02b2217>] cifs_open+0x327/0x760 [cifs] >> > [ 1732.096609] [<ffffffff811cb923>] do_dentry_open+0x263/0x310 >> > [ 1732.096609] [<ffffffff811cba1b>] finish_open+0x4b/0x70 >> > [ 1732.096609] [<ffffffff811deace>] do_last+0x6de/0xe00 >> > [ 1732.096609] [<ffffffff811df2ad>] path_openat+0xbd/0x500 >> > [ 1732.096609] [<ffffffff811dfcc1>] do_filp_open+0x41/0xa0 >> > [ 1732.096609] [<ffffffff811ccdc6>] do_sys_open+0xf6/0x1e0 >> > [ 1732.096609] [<ffffffff811cced1>] sys_open+0x21/0x30 >> > [ 1732.096609] [<ffffffff816e42a9>] system_call_fastpath+0x16/0x1b >> > [ 1732.096609] >> > [ 1732.096609] -> #0 (sb_writers#13){.+.+.+}: >> > [ 1732.096609] [<ffffffff810d4f52>] __lock_acquire+0x1372/0x1ad0 >> > [ 1732.096609] [<ffffffff810d5d91>] lock_acquire+0xa1/0x1f0 >> > [ 1732.096609] [<ffffffff811d0b87>] __sb_start_write+0xd7/0x1d0 >> > [ 1732.096609] [<ffffffff8115edd3>] generic_file_aio_write+0x63/0x100 >> > [ 1732.096609] [<ffffffffa02b6299>] cifs_strict_writev+0x169/0x1b0 [cifs] >> > [ 1732.096609] [<ffffffff811cd137>] do_sync_write+0xa7/0xe0 >> > [ 1732.096609] [<ffffffff811cda2f>] vfs_write+0xaf/0x190 >> > [ 1732.096609] [<ffffffff811cdd6d>] sys_write+0x4d/0x90 >> > [ 1732.096609] [<ffffffff816e42a9>] system_call_fastpath+0x16/0x1b >> > [ 1732.096609] >> > [ 1732.096609] other info that might help us debug this: >> > [ 1732.096609] >> > [ 1732.096609] Possible unsafe locking scenario: >> > [ 1732.096609] >> > [ 1732.096609] CPU0 CPU1 >> > [ 1732.096609] ---- ---- >> > [ 1732.096609] lock(&cifsi->lock_sem); >> > [ 1732.096609] lock(sb_writers#13); >> > [ 1732.096609] lock(&cifsi->lock_sem); >> > [ 1732.096609] lock(sb_writers#13); >> > [ 1732.096609] >> > [ 1732.096609] *** DEADLOCK *** >> > [ 1732.096609] >> > [ 1732.096609] 1 lock held by test5/7374: >> > [ 1732.096609] #0: (&cifsi->lock_sem){+++++.}, at: [<ffffffffa02b61fc>] cifs_strict_writev+0xcc/0x1b0 [cifs] >> > [ 1732.096609] >> > [ 1732.096609] stack backtrace: >> > [ 1732.096609] Pid: 7374, comm: test5 Tainted: G O 3.6.0-0.rc4.git0.1.fc18.x86_64.debug #1 >> > [ 1732.096609] Call Trace: >> > [ 1732.096609] [<ffffffff816cf0b4>] print_circular_bug+0x1fb/0x20c >> > [ 1732.096609] [<ffffffff810d4f52>] __lock_acquire+0x1372/0x1ad0 >> > [ 1732.096609] [<ffffffff8104aee2>] ? kvm_clock_read+0x32/0x40 >> > [ 1732.096609] [<ffffffff81021e39>] ? sched_clock+0x9/0x10 >> > [ 1732.096609] [<ffffffff810ac295>] ? sched_clock_local+0x25/0xa0 >> > [ 1732.096609] [<ffffffff810d5d91>] lock_acquire+0xa1/0x1f0 >> > [ 1732.096609] [<ffffffff8115edd3>] ? generic_file_aio_write+0x63/0x100 >> > [ 1732.096609] [<ffffffff811d0b87>] __sb_start_write+0xd7/0x1d0 >> > [ 1732.096609] [<ffffffff8115edd3>] ? generic_file_aio_write+0x63/0x100 >> > [ 1732.096609] [<ffffffff8115edd3>] ? generic_file_aio_write+0x63/0x100 >> > [ 1732.096609] [<ffffffff8115edd3>] generic_file_aio_write+0x63/0x100 >> > [ 1732.096609] [<ffffffffa02b6299>] cifs_strict_writev+0x169/0x1b0 [cifs] >> > [ 1732.096609] [<ffffffff811cd137>] do_sync_write+0xa7/0xe0 >> > [ 1732.096609] [<ffffffff811cda2f>] vfs_write+0xaf/0x190 >> > [ 1732.096609] [<ffffffff811cdd6d>] sys_write+0x4d/0x90 >> > [ 1732.096609] [<ffffffff816e42a9>] system_call_fastpath+0x16/0x1b >> > >> > >> > -- >> > Jeff Layton <jlayton@xxxxxxxxxx> >> >> I think we can replace generic_file_aio_write call here with something >> like cifs_generic_aio_write that is almost the same as >> generic_file_aio_write but with up_read(&cinode->lock_sem) after >> sb_start_write and down_read(&cinode->lock_sem) before sb_end_write - >> to make locking in the same order: >> >> 1) sb_writers >> 2) lock_sem >> >> thoughts? >> > > I don't know. I don't have a good feeling about this whole lock_sem > idea at all. It seems like a huge swath of this code is now running > under it. > > That's not good from a performance standpoint for one thing. It may > also lead to subtle bugs later when we make changes to this code. Parts > of it are now being serialized that may not need to be. If that > serialization is later removed, it could have "interesting" effects. > > I'm not sure what would work better however... > -- > Jeff Layton <jlayton@xxxxxxxxxx> Another way is to keep lock_sem only during cifs_find_lock_conflicts call and release just after it and before generic_file_aio_write(read). I understand that it is not the best option but it let us not hold lock_sem too long time. (it is similiar to the way we have now in VFS for mandatory locks - of course buggy). -- Best regards, Pavel Shilovsky. -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html