splice vs truncate lockdep splat

Sage Weil <sweil@xxxxxxxxxx> · Sun, 20 Mar 2016 09:43:01 -0400 (EDT)

Hi all,

For the last few kernel releases we've been seeing this pretty regularly:

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 4.5.0-rc3-ceph-00790-g2364ac5 #1 Not tainted
 -------------------------------------------------------
 tp_fstore_op/25647 is trying to acquire lock:
 (&sb->s_type->i_mutex_key#19){+.+.+.}, at: [<ffffffffa0574bad>] xfs_file_buffered_aio_write+0x6d/0x300 [xfs]
 #012but task is already holding lock:
 (&pipe->mutex/1){+.+.+.}, at: [<ffffffff8126347d>] pipe_lock+0x1d/0x20
 #012which lock already depends on the new lock.
 #012the existing dependency chain (in reverse order) is:
 #012-> #2 (&pipe->mutex/1){+.+.+.}:
       [<ffffffff81100842>] lock_acquire+0x162/0x1b0
       [<ffffffff817d77c9>] mutex_lock_nested+0x79/0x380
       [<ffffffff8126347d>] pipe_lock+0x1d/0x20
       [<ffffffff81290d20>] splice_to_pipe+0x30/0x250
       [<ffffffff81292379>] __generic_file_splice_read+0x4f9/0x620
       [<ffffffff812924de>] generic_file_splice_read+0x3e/0x70
       [<ffffffffa05738d4>] xfs_file_splice_read+0x194/0x220 [xfs]
       [<ffffffff812912da>] do_splice_to+0x6a/0x80
       [<ffffffff812934e1>] SyS_splice+0x6c1/0x720
       [<ffffffff817da4b2>] entry_SYSCALL_64_fastpath+0x12/0x76
 #012-> #1 (&(&ip->i_iolock)->mr_lock){++++++}:
       [<ffffffff81100842>] lock_acquire+0x162/0x1b0
       [<ffffffff810fa37f>] down_write_nested+0x4f/0x80
       [<ffffffffa058106d>] xfs_ilock+0xed/0x1b0 [xfs]
       [<ffffffffa05801c0>] xfs_vn_setattr+0x30/0x70 [xfs]
       [<ffffffff8127a09d>] notify_change+0x1ed/0x320
       [<ffffffff8125843c>] do_truncate+0x7c/0xc0
       [<ffffffff81258843>] do_sys_ftruncate.constprop.12+0x153/0x170
       [<ffffffff8125889e>] SyS_ftruncate+0xe/0x10
       [<ffffffff817da4b2>] entry_SYSCALL_64_fastpath+0x12/0x76
 #012-> #0 (&sb->s_type->i_mutex_key#19){+.+.+.}:
       [<ffffffff810ff57e>] __lock_acquire+0x13be/0x1e30
       [<ffffffff81100842>] lock_acquire+0x162/0x1b0
       [<ffffffff817d77c9>] mutex_lock_nested+0x79/0x380
       [<ffffffffa0574bad>] xfs_file_buffered_aio_write+0x6d/0x300 [xfs]
       [<ffffffffa0574ed5>] xfs_file_write_iter+0x95/0x140 [xfs]
       [<ffffffff81259ee5>] vfs_iter_write+0x75/0xb0
       [<ffffffff81291a51>] iter_file_splice_write+0x281/0x3a0
       [<ffffffff812932fa>] SyS_splice+0x4da/0x720
       [<ffffffff817da4b2>] entry_SYSCALL_64_fastpath+0x12/0x76
 #012other info that might help us debug this:
 Chain exists of:#012  &sb->s_type->i_mutex_key#19 --> &(&ip->i_iolock)->mr_lock --> &pipe->mutex/1
 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(&pipe->mutex/1);
                               lock(&(&ip->i_iolock)->mr_lock);
                               lock(&pipe->mutex/1);
  lock(&sb->s_type->i_mutex_key#19);
 #012 *** DEADLOCK ***
 2 locks held by tp_fstore_op/25647:
 #0:  (sb_writers#15){.+.+.+}, at: [<ffffffff8125db7d>] __sb_start_write+0xad/0xe0
 #1:  (&pipe->mutex/1){+.+.+.}, at: [<ffffffff8126347d>] pipe_lock+0x1d/0x20
 #012stack backtrace:
 CPU: 4 PID: 25647 Comm: tp_fstore_op Not tainted 4.5.0-rc3-ceph-00790-g2364ac5 #1
 Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 1.0c 09/07/2015
 0000000000000086 00000000547c0aa5 ffff880858c0bad8 ffffffff813f8661
 ffffffff827127a0 ffffffff82711870 ffff880858c0bb18 ffffffff811c5d39
 ffff880858c0bb70 ffff8808530288d8 0000000000000001 0000000000000002
 Call Trace:
 [<ffffffff813f8661>] dump_stack+0x85/0xc4
 [<ffffffff811c5d39>] print_circular_bug+0x200/0x20e
 [<ffffffff810ff57e>] __lock_acquire+0x13be/0x1e30
 [<ffffffff81100842>] lock_acquire+0x162/0x1b0
 [<ffffffffa0574bad>] ? xfs_file_buffered_aio_write+0x6d/0x300 [xfs]
 [<ffffffff817d77c9>] mutex_lock_nested+0x79/0x380
 [<ffffffffa0574bad>] ? xfs_file_buffered_aio_write+0x6d/0x300 [xfs]
 [<ffffffff810d338d>] ? ttwu_stat+0x19d/0x200
 [<ffffffffa0574bad>] xfs_file_buffered_aio_write+0x6d/0x300 [xfs]
 [<ffffffff810fdc6f>] ? mark_held_locks+0x6f/0x80
 [<ffffffff810fdde3>] ? trace_hardirqs_on_caller+0x163/0x1d0
 [<ffffffffa0574ed5>] xfs_file_write_iter+0x95/0x140 [xfs]
 [<ffffffff81259ee5>] vfs_iter_write+0x75/0xb0
 [<ffffffff81291a51>] iter_file_splice_write+0x281/0x3a0
 [<ffffffff812932fa>] SyS_splice+0x4da/0x720
 [<ffffffff810031d6>] ? do_audit_syscall_entry+0x66/0x70
 [<ffffffff817da4b2>] entry_SYSCALL_64_fastpath+0x12/0x76

We're not actually doing racing truncate and splice (or any racing 
operations for that matter) to the same files, so we don't actually hit 
this, but the lockdep warning is enough to make our test runs fail, and 
we'd rather not whitelist this if it's something that should get fixed 
anyway.

My apologies for not raising this sooner; I thought I'd already emailed 
the list about this a few months ago but it looks think I'd only pinged 
#xfs.

sage

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs