Re: lockdep spew with Pavel's for-next branch

Pavel Shilovsky <piastryyy@xxxxxxxxx> · Thu, 6 Sep 2012 18:34:45 +0400

2012/9/5 Jeff Layton <jlayton@xxxxxxxxxx>:
> I got the following lockdep pop when testing a few patches on top of
> Pavel's for-next branch. The relevant mount options were:
>
>     vers=1.0,cache=strict,nounix
>
> The test that caused it to pop was connectathon test5 (the read/write
> test).
>
> I haven't yet done any investigation but it looks like a plausible
> deadlock possibility. Also, fwiw -- the test7 regression that I
> reported earlier seems to have gone away for now. No idea why.
>
> Anyway, here's the lockdep spew:
>
> [ 1732.096609] ======================================================
> [ 1732.096609] [ INFO: possible circular locking dependency detected ]
> [ 1732.096609] 3.6.0-0.rc4.git0.1.fc18.x86_64.debug #1 Tainted: G           O
> [ 1732.096609] -------------------------------------------------------
> [ 1732.096609] test5/7374 is trying to acquire lock:
> [ 1732.096609]  (sb_writers#13){.+.+.+}, at: [<ffffffff8115edd3>] generic_file_aio_write+0x63/0x100
> [ 1732.096609]
> [ 1732.096609] but task is already holding lock:
> [ 1732.096609]  (&cifsi->lock_sem){+++++.}, at: [<ffffffffa02b61fc>] cifs_strict_writev+0xcc/0x1b0 [cifs]
> [ 1732.096609]
> [ 1732.096609] which lock already depends on the new lock.
> [ 1732.096609]
> [ 1732.096609]
> [ 1732.096609] the existing dependency chain (in reverse order) is:
> [ 1732.096609]
> [ 1732.096609] -> #1 (&cifsi->lock_sem){+++++.}:
> [ 1732.096609]        [<ffffffff810d5d91>] lock_acquire+0xa1/0x1f0
> [ 1732.096609]        [<ffffffff816d8721>] down_write+0x61/0xb0
> [ 1732.096609]        [<ffffffffa02b1c8d>] cifs_new_fileinfo+0xbd/0x2e0 [cifs]
> [ 1732.096609]        [<ffffffffa02b2217>] cifs_open+0x327/0x760 [cifs]
> [ 1732.096609]        [<ffffffff811cb923>] do_dentry_open+0x263/0x310
> [ 1732.096609]        [<ffffffff811cba1b>] finish_open+0x4b/0x70
> [ 1732.096609]        [<ffffffff811deace>] do_last+0x6de/0xe00
> [ 1732.096609]        [<ffffffff811df2ad>] path_openat+0xbd/0x500
> [ 1732.096609]        [<ffffffff811dfcc1>] do_filp_open+0x41/0xa0
> [ 1732.096609]        [<ffffffff811ccdc6>] do_sys_open+0xf6/0x1e0
> [ 1732.096609]        [<ffffffff811cced1>] sys_open+0x21/0x30
> [ 1732.096609]        [<ffffffff816e42a9>] system_call_fastpath+0x16/0x1b
> [ 1732.096609]
> [ 1732.096609] -> #0 (sb_writers#13){.+.+.+}:
> [ 1732.096609]        [<ffffffff810d4f52>] __lock_acquire+0x1372/0x1ad0
> [ 1732.096609]        [<ffffffff810d5d91>] lock_acquire+0xa1/0x1f0
> [ 1732.096609]        [<ffffffff811d0b87>] __sb_start_write+0xd7/0x1d0
> [ 1732.096609]        [<ffffffff8115edd3>] generic_file_aio_write+0x63/0x100
> [ 1732.096609]        [<ffffffffa02b6299>] cifs_strict_writev+0x169/0x1b0 [cifs]
> [ 1732.096609]        [<ffffffff811cd137>] do_sync_write+0xa7/0xe0
> [ 1732.096609]        [<ffffffff811cda2f>] vfs_write+0xaf/0x190
> [ 1732.096609]        [<ffffffff811cdd6d>] sys_write+0x4d/0x90
> [ 1732.096609]        [<ffffffff816e42a9>] system_call_fastpath+0x16/0x1b
> [ 1732.096609]
> [ 1732.096609] other info that might help us debug this:
> [ 1732.096609]
> [ 1732.096609]  Possible unsafe locking scenario:
> [ 1732.096609]
> [ 1732.096609]        CPU0                    CPU1
> [ 1732.096609]        ----                    ----
> [ 1732.096609]   lock(&cifsi->lock_sem);
> [ 1732.096609]                                lock(sb_writers#13);
> [ 1732.096609]                                lock(&cifsi->lock_sem);
> [ 1732.096609]   lock(sb_writers#13);
> [ 1732.096609]
> [ 1732.096609]  *** DEADLOCK ***
> [ 1732.096609]
> [ 1732.096609] 1 lock held by test5/7374:
> [ 1732.096609]  #0:  (&cifsi->lock_sem){+++++.}, at: [<ffffffffa02b61fc>] cifs_strict_writev+0xcc/0x1b0 [cifs]
> [ 1732.096609]
> [ 1732.096609] stack backtrace:
> [ 1732.096609] Pid: 7374, comm: test5 Tainted: G           O 3.6.0-0.rc4.git0.1.fc18.x86_64.debug #1
> [ 1732.096609] Call Trace:
> [ 1732.096609]  [<ffffffff816cf0b4>] print_circular_bug+0x1fb/0x20c
> [ 1732.096609]  [<ffffffff810d4f52>] __lock_acquire+0x1372/0x1ad0
> [ 1732.096609]  [<ffffffff8104aee2>] ? kvm_clock_read+0x32/0x40
> [ 1732.096609]  [<ffffffff81021e39>] ? sched_clock+0x9/0x10
> [ 1732.096609]  [<ffffffff810ac295>] ? sched_clock_local+0x25/0xa0
> [ 1732.096609]  [<ffffffff810d5d91>] lock_acquire+0xa1/0x1f0
> [ 1732.096609]  [<ffffffff8115edd3>] ? generic_file_aio_write+0x63/0x100
> [ 1732.096609]  [<ffffffff811d0b87>] __sb_start_write+0xd7/0x1d0
> [ 1732.096609]  [<ffffffff8115edd3>] ? generic_file_aio_write+0x63/0x100
> [ 1732.096609]  [<ffffffff8115edd3>] ? generic_file_aio_write+0x63/0x100
> [ 1732.096609]  [<ffffffff8115edd3>] generic_file_aio_write+0x63/0x100
> [ 1732.096609]  [<ffffffffa02b6299>] cifs_strict_writev+0x169/0x1b0 [cifs]
> [ 1732.096609]  [<ffffffff811cd137>] do_sync_write+0xa7/0xe0
> [ 1732.096609]  [<ffffffff811cda2f>] vfs_write+0xaf/0x190
> [ 1732.096609]  [<ffffffff811cdd6d>] sys_write+0x4d/0x90
> [ 1732.096609]  [<ffffffff816e42a9>] system_call_fastpath+0x16/0x1b
>
>
> --
> Jeff Layton <jlayton@xxxxxxxxxx>

I think we can replace generic_file_aio_write call here with something
like cifs_generic_aio_write that is almost the same as
generic_file_aio_write but with up_read(&cinode->lock_sem) after
sb_start_write and down_read(&cinode->lock_sem) before sb_end_write -
to make locking in the same order:

1) sb_writers
2) lock_sem

thoughts?

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html