On Fri, Apr 08, 2016 at 01:18:44PM -0400, Brian Foster wrote: > On Fri, Apr 08, 2016 at 09:37:45AM +1000, Dave Chinner wrote: > > Hi folks, > > > > This is the second version of this patch set, first posted and > > described here: > > > > http://oss.sgi.com/archives/xfs/2016-04/msg00069.html > > > > The only change from the first version is splitting up the first > > patch into two as Christoph requested - one for the bug fix, the > > other for the variable renaming. > > > > Did your xfstests testing for this series include generic/233? I'm > seeing a consistently reproducible test hang. The test is hanging on a > "xfs_quota -x -c off -ug /mnt/scratch" command. The stack is as follows: > > [<ffffffffa0772306>] xfs_qm_dquot_walk.isra.8+0x196/0x1b0 [xfs] > [<ffffffffa0774a98>] xfs_qm_dqpurge_all+0x78/0x80 [xfs] > [<ffffffffa07713e8>] xfs_qm_scall_quotaoff+0x148/0x640 [xfs] > [<ffffffffa077733d>] xfs_quota_disable+0x3d/0x50 [xfs] > [<ffffffff812c27e3>] SyS_quotactl+0x3b3/0x8c0 > [<ffffffff81003e17>] do_syscall_64+0x67/0x190 > [<ffffffff81763f7f>] return_from_SYSCALL_64+0x0/0x7a > [<ffffffffffffffff>] 0xffffffffffffffff > > ... and it looks like the kernel is spinning somehow or another between > inode reclaim and xfsaild: > > ... > kworker/1:2-210 [001] ...1 895.750591: xfs_perag_get_tag: dev 253:3 agno 1 refcount 1 caller xfs_reclaim_inodes_ag [xfs] > kworker/1:2-210 [001] ...1 895.750609: xfs_perag_put: dev 253:3 agno 1 refcount 0 caller xfs_reclaim_inodes_ag [xfs] > kworker/1:2-210 [001] ...1 895.750609: xfs_perag_get_tag: dev 253:3 agno 2 refcount 5 caller xfs_reclaim_inodes_ag [xfs] > kworker/1:2-210 [001] ...1 895.750611: xfs_perag_put: dev 253:3 agno 2 refcount 4 caller xfs_reclaim_inodes_ag [xfs] > kworker/1:2-210 [001] ...1 895.750612: xfs_perag_get_tag: dev 253:3 agno 3 refcount 1 caller xfs_reclaim_inodes_ag [xfs] > kworker/1:2-210 [001] ...1 895.750613: xfs_perag_put: dev 253:3 agno 3 refcount 0 caller xfs_reclaim_inodes_ag [xfs] > xfsaild/dm-3-12406 [003] ...2 895.760588: xfs_ail_locked: dev 253:3 lip 0xffff8801f8e65d80 lsn 2/5709 type XFS_LI_QUOTAOFF flags IN_AIL > xfsaild/dm-3-12406 [003] ...2 895.810595: xfs_ail_locked: dev 253:3 lip 0xffff8801f8e65d80 lsn 2/5709 type XFS_LI_QUOTAOFF flags IN_AIL > xfsaild/dm-3-12406 [003] ...2 895.860586: xfs_ail_locked: dev 253:3 lip 0xffff8801f8e65d80 lsn 2/5709 type XFS_LI_QUOTAOFF flags IN_AIL > xfsaild/dm-3-12406 [003] ...2 895.910596: xfs_ail_locked: dev 253:3 lip 0xffff8801f8e65d80 lsn 2/5709 type XFS_LI_QUOTAOFF flags IN_AIL > ... No deadlock involving the AIL - it doesn't remove the XFS_LI_QUOTAOFF from the AIL - the quota code committing the quotaoff-end transactions is what removes that. IOWs, the dquot walk has not completed, so quotaoff has not completed, so the XFS_LI_QUOTAOFF is still in the AIL. IOWs, this looks like xfs_qm_dquot_walk() is skipping dquots because xfs_qm_dqpurge is hitting this: xfs_dqlock(dqp); if ((dqp->dq_flags & XFS_DQ_FREEING) || dqp->q_nrefs != 0) { xfs_dqunlock(dqp); return -EAGAIN; } So that means we've got an inode that probably hasn't been reclaimed, because the last thing that happens during reclaim is the dquots are detatched from the inode and hence the reference counts are dropped. > FWIW, this only occurs with patch 6 applied. The test and scratch > devices are both 10GB lvm volumes formatted with mkfs defaults (v5). I can't see how patch 6 would prevent an inode from being reclaimed, as all the changes occur *after* the reclaim decision has been made. More investigation needed, I guess... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs