Hello, We're seeing a deadlock on xfs in a few kernels in production and are having a hard time figuring out what's happening. Here is a breakdown of the stack traces on a box I could get to before it was rebooted, all the boxes we've found have been similar 100 hits: [<ffffffff813bd7ae>] xlog_grant_head_wait+0xbe/0x1e0 [<ffffffff813bd958>] xlog_grant_head_check+0x88/0xe0 [<ffffffff813bff89>] xfs_log_reserve+0xc9/0x1c0 [<ffffffff813ba3dd>] xfs_trans_reserve+0x17d/0x1f0 [<ffffffff813bb72e>] xfs_trans_alloc+0xbe/0x130 [<ffffffff813a96ae>] xfs_vn_update_time+0x4e/0x130 [<ffffffff812563dd>] file_update_time+0xbd/0x130 [<ffffffff8139f964>] xfs_file_aio_write_checks+0x144/0x1b0 [<ffffffff8139fd9f>] xfs_file_buffered_aio_write+0x6f/0x250 [<ffffffff813a009d>] xfs_file_write_iter+0x11d/0x160 [<ffffffff81238233>] __do_readv_writev+0x2d3/0x3b0 [<ffffffff8123838d>] do_readv_writev+0x7d/0xb0 [<ffffffff81238774>] do_writev+0x74/0x100 [<ffffffff81239500>] SyS_writev+0x10/0x20 [<ffffffff81002b03>] do_syscall_64+0x53/0x150 [<ffffffff8196c7c6>] entry_SYSCALL64_slow_path+0x25/0x25 [<ffffffffffffffff>] 0xffffffffffffffff ----- [<ffffffff813bd7ae>] xlog_grant_head_wait+0xbe/0x1e0 [<ffffffff813bd958>] xlog_grant_head_check+0x88/0xe0 [<ffffffff813bff89>] xfs_log_reserve+0xc9/0x1c0 [<ffffffff813ba3dd>] xfs_trans_reserve+0x17d/0x1f0 [<ffffffff813bb72e>] xfs_trans_alloc+0xbe/0x130 [<ffffffff813a96ae>] xfs_vn_update_time+0x4e/0x130 [<ffffffff812563dd>] file_update_time+0xbd/0x130 [<ffffffff8139f964>] xfs_file_aio_write_checks+0x144/0x1b0 [<ffffffff8139fd9f>] xfs_file_buffered_aio_write+0x6f/0x250 [<ffffffff813a009d>] xfs_file_write_iter+0x11d/0x160 [<ffffffff81236da9>] new_sync_write+0xd9/0x120 [<ffffffff81237f0d>] vfs_write+0x18d/0x1e0 [<ffffffff81239228>] SyS_write+0x48/0xa0 [<ffffffff81002b03>] do_syscall_64+0x53/0x150 [<ffffffff8196c7c6>] entry_SYSCALL64_slow_path+0x25/0x25 [<ffffffffffffffff>] 0xffffffffffffffff ----- 10 hits: [<ffffffff813bd7ae>] xlog_grant_head_wait+0xbe/0x1e0 [<ffffffff813bd958>] xlog_grant_head_check+0x88/0xe0 [<ffffffff813bff89>] xfs_log_reserve+0xc9/0x1c0 [<ffffffff813ba3dd>] xfs_trans_reserve+0x17d/0x1f0 [<ffffffff813bb72e>] xfs_trans_alloc+0xbe/0x130 [<ffffffff813aedfd>] xfs_remove+0x10d/0x2d0 [<ffffffff813a9cfa>] xfs_vn_unlink+0x4a/0x90 [<ffffffff81249547>] vfs_unlink+0xf7/0x180 [<ffffffff8124a26f>] do_unlinkat+0x25f/0x2e0 [<ffffffff8124afc6>] SyS_unlink+0x16/0x20 [<ffffffff81002b03>] do_syscall_64+0x53/0x150 [<ffffffff8196c7c6>] entry_SYSCALL64_slow_path+0x25/0x25 [<ffffffffffffffff>] 0xffffffffffffffff ----- 2 hits: [<ffffffff813bd7ae>] xlog_grant_head_wait+0xbe/0x1e0 [<ffffffff813bd958>] xlog_grant_head_check+0x88/0xe0 [<ffffffff813bff89>] xfs_log_reserve+0xc9/0x1c0 [<ffffffff813ba3dd>] xfs_trans_reserve+0x17d/0x1f0 [<ffffffff813bb72e>] xfs_trans_alloc+0xbe/0x130 [<ffffffff813a96ae>] xfs_vn_update_time+0x4e/0x130 [<ffffffff812563dd>] file_update_time+0xbd/0x130 [<ffffffff8139f964>] xfs_file_aio_write_checks+0x144/0x1b0 [<ffffffff8139fc0f>] xfs_file_dio_aio_write+0x10f/0x230 [<ffffffff813a0028>] xfs_file_write_iter+0xa8/0x160 [<ffffffff81236da9>] new_sync_write+0xd9/0x120 [<ffffffff81237f0d>] vfs_write+0x18d/0x1e0 [<ffffffff81239228>] SyS_write+0x48/0xa0 [<ffffffff81002b03>] do_syscall_64+0x53/0x150 [<ffffffff8196c7c6>] entry_SYSCALL64_slow_path+0x25/0x25 [<ffffffffffffffff>] 0xffffffffffffffff ----- 1 hit: [<ffffffff813bd7ae>] xlog_grant_head_wait+0xbe/0x1e0 [<ffffffff813bd958>] xlog_grant_head_check+0x88/0xe0 [<ffffffff813bff89>] xfs_log_reserve+0xc9/0x1c0 [<ffffffff813ba3dd>] xfs_trans_reserve+0x17d/0x1f0 [<ffffffff813bb72e>] xfs_trans_alloc+0xbe/0x130 [<ffffffff813931ae>] xfs_setfilesize_trans_alloc.isra.14+0x2e/0x70 [<ffffffff81393a62>] xfs_submit_ioend+0x102/0x190 [<ffffffff81394288>] xfs_vm_writepages+0xa8/0xd0 [<ffffffff811b784f>] do_writepages+0x1f/0x30 [<ffffffff81268ce5>] __writeback_single_inode+0x45/0x320 [<ffffffff812694cb>] writeback_sb_inodes+0x25b/0x4e0 [<ffffffff8126980f>] __writeback_inodes_wb+0xbf/0x100 [<ffffffff81269b45>] wb_writeback+0x245/0x2d0 [<ffffffff8126a44b>] wb_workfn+0x14b/0x310 [<ffffffff810a68f0>] process_one_work+0x150/0x410 [<ffffffff810a721d>] worker_thread+0x6d/0x520 [<ffffffff810ad3fc>] kthread+0x12c/0x160 [<ffffffff8196c959>] ret_from_fork+0x29/0x40 [<ffffffffffffffff>] 0xffffffffffffffff ----- [<ffffffff813bd7ae>] xlog_grant_head_wait+0xbe/0x1e0 [<ffffffff813bd958>] xlog_grant_head_check+0x88/0xe0 [<ffffffff813bff89>] xfs_log_reserve+0xc9/0x1c0 [<ffffffff813ba3dd>] xfs_trans_reserve+0x17d/0x1f0 [<ffffffff813bb72e>] xfs_trans_alloc+0xbe/0x130 [<ffffffff813ad890>] xfs_create+0x190/0x6d0 [<ffffffff813aae49>] xfs_generic_create+0x219/0x2b0 [<ffffffff813aaf14>] xfs_vn_mknod+0x14/0x20 [<ffffffff813aaf53>] xfs_vn_create+0x13/0x20 [<ffffffff81248adb>] path_openat+0x13bb/0x1470 [<ffffffff8124a62b>] do_filp_open+0x8b/0x140 [<ffffffff8123634c>] do_sys_open+0x11c/0x1f0 [<ffffffff8123643e>] SyS_open+0x1e/0x20 [<ffffffff81002b03>] do_syscall_64+0x53/0x150 [<ffffffff8196c7c6>] entry_SYSCALL64_slow_path+0x25/0x25 [<ffffffffffffffff>] 0xffffffffffffffff The only "fishy" thing is in our kernels (4.6, 4.11, and 4.16) xfs_vm_writepages calls xfs_submit_ioend with the page locked, whereas upstream doesn't. However the change that introduced this is 8e1f065bea1b ("xfs: refactor the tail of xfs_writepage_map") which just moved things around, and reading the code it looks like the locking order is page lock -> trans reservation, so not necessarily a problem. Does this ring bells for anybody? If not I'll just keep digging through boxes that trip over this until I figure out what's going on. Thanks, Josef