On 07/26/2012 05:55 PM, Dave Chinner wrote:
On Thu, Jul 26, 2012 at 06:35:05PM +1000, Dave Chinner wrote:
From: Dave Chinner <dchinner@xxxxxxxxxx>
Remount won't run a quota check - it's only done during mount. Hence
all quota tests using this check function are not actually
validating XFS filesystems right now.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
FWIW, this change is exposing some problems in the new dquot code:
---
common.quota | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/common.quota b/common.quota
index 9736306..2fa784jack@xxxxxxxx 100644
--- a/common.quota
+++ b/common.quota
@@ -236,6 +236,11 @@ _check_quota_usage()jack@xxxxxxx
{
# Sync to get delalloc to disk
sync
+
+ # kill caches to guarantee removal speculative delalloc
+ # XXX: really need an ioctl instead of this big hammer
+ echo 3 > /proc/sys/vm/drop_caches
+
Some kind of locking issue is present:
[ 1871.738970] XFS (vdb): Quotacheck: Done.
[ 1877.795774] ------------[ cut here ]------------
[ 1877.797347] WARNING: at kernel/mutex-debug.c:78 debug_mutex_unlock+0xda/0xe0()
[ 1877.799416] Hardware name: Bochs
[ 1877.799416] Modules linked in:
[ 1877.799416] Pid: 2261, comm: 232 Not tainted 3.5.0-rc5-dgc+ #313
[ 1877.799416] Call Trace:
[ 1877.799416] [<ffffffff8107a83f>] warn_slowpath_common+0x7f/0xc0
[ 1877.799416] [<ffffffff8107a89a>] warn_slowpath_null+0x1a/0x20
[ 1877.799416] [<ffffffff810d022a>] debug_mutex_unlock+0xda/0xe0
[ 1877.799416] [<ffffffff81b4c97c>] __mutex_unlock_slowpath+0x7c/0x130
[ 1877.799416] [<ffffffff81b4ca3e>] mutex_unlock+0xe/0x10
[ 1877.799416] [<ffffffff814b12d8>] xfs_qm_dqreclaim_one+0x178/0x3d0
[ 1877.799416] [<ffffffff814b1620>] xfs_qm_shake+0xf0/0x170
[ 1877.799416] [<ffffffff81137789>] shrink_slab+0x169/0x350
[ 1877.799416] [<ffffffff81709b04>] ? do_raw_spin_lock+0x54/0x120
[ 1877.799416] [<ffffffff8118a488>] ? iput+0x48/0x210
[ 1877.799416] [<ffffffff8119b433>] drop_caches_sysctl_handler+0x73/0xa0
[ 1877.799416] [<ffffffff811de863>] proc_sys_call_handler.isra.11+0xb3/0xd0
[ 1877.799416] [<ffffffff811de898>] proc_sys_write+0x18/0x20
[ 1877.799416] [<ffffffff81170298>] vfs_write+0xa8/0x160
[ 1877.799416] [<ffffffff8117058a>] sys_write+0x4a/0x90
[ 1877.799416] [<ffffffff81b57269>] system_call_fastpath+0x16/0x1b
[ 1877.799416] ---[ end trace 4f2a89b2cbd5e64f ]---
which is:
DEBUG_LOCKS_WARN_ON(lock->owner != current);
so something other than the task that locked the mutex unlocked it,
or we are unlocking an unlocked dquot...
VFS_QUOTA=0
case $FSTYP in
ext2|ext3|ext4|ext4dev|reiserfs)
@@ -253,8 +258,9 @@ _check_quota_usage()
quotacheck -u -g $SCRATCH_MNT 2>/dev/null
else
# use XFS method to force quotacheck
- mount -o remount,noquota $SCRATCH_DEV
- mount -o remount,usrquota,grpquota $SCRATCH_DEV
+ xfs_quota -x -c "off -ug" $SCRATCH_MNT
And this is hanging with what appears to be a reference counting bug
when purging dquots in generic/233:
# echo w > /proc/sysrq-trigger
[53710.206100] SysRq : Show Blocked State
[53710.207213] task PC stack pid father
[53710.208749] xfs_quota D ffff88003fc12880 3896 18147 17936 0x00000000
[53710.209738] ffff88000f3afc18 0000000000000086 ffff88001cb160c0 ffff88000f3affd8
[53710.209738] ffff88000f3affd8 ffff88000f3affd8 ffffffff81f9b420 ffff88001cb160c0
[53710.209738] ffff88000f3afc08 ffffffff821ece80 ffff88000f3afc50 0000000100cbbe68
[53710.209738] Call Trace:
[53710.209738] [<ffffffff81b4dea9>] schedule+0x29/0x70
[53710.209738] [<ffffffff81b4bcad>] schedule_timeout+0x13d/0x2c0
[53710.209738] [<ffffffff81089f90>] ? usleep_range+0x50/0x50
[53710.209738] [<ffffffff814aea90>] ? xfs_qm_need_dqattach+0x70/0x70
[53710.209738] [<ffffffff81b4be4e>] schedule_timeout_uninterruptible+0x1e/0x20
[53710.209738] [<ffffffff814aeef3>] xfs_qm_dquot_walk+0x153/0x170
[53710.209738] [<ffffffff816fb81b>] ? radix_tree_lookup+0xb/0x10
[53710.209738] [<ffffffff8149772a>] ? xfs_perag_get+0x3a/0x120
[53710.209738] [<ffffffff814ace60>] ? xfs_trans_free_dqinfo+0x40/0x40
[53710.209738] [<ffffffff81448aef>] ? xfs_inode_ag_iterator+0x8f/0xa0
[53710.209738] [<ffffffff814aef93>] xfs_qm_dqpurge_all+0x83/0x90
[53710.209738] [<ffffffff814ae4b9>] xfs_qm_scall_quotaoff+0x139/0x350
[53710.209738] [<ffffffff814b2780>] xfs_fs_set_xstate+0xd0/0xf0
[53710.209738] [<ffffffff811d1088>] sys_quotactl+0x1f8/0x740
[53710.209738] [<ffffffff81174d7a>] ? sys_newstat+0x2a/0x40
[53710.209738] [<ffffffff81b52635>] ? do_async_page_fault+0x35/0x90
[53710.209738] [<ffffffff81b57269>] system_call_fastpath+0x16/0x1b
It's hitting a dquot that either has the FREEING flag set of an
elevated reference count, so is skipping it. It gets stuck in the
loop forever retrying. That's probably related to the above lock
issue.
And generic/231 fails with a significant accounting difference:
generic/231 [failed, exit status 1] - output mismatch (see tests/generic/231.out.bad)
--- tests/generic/231.out 2012-07-26 18:42:30.000000000 +1000
+++ results/generic/231.out.bad 2012-07-27 08:24:22.000000000 +1000
@@ -2,15 +2,7 @@
=== FSX Standard Mode, Memory Mapping, 1 Tasks ===
All operations completed A-OK!
Comparing user usage
-Comparing group usage
-=== FSX Standard Mode, Memory Mapping, 4 Tasks ===
-All operations completed A-OK!
-All operations completed A-OK!
-All operations completed A-OK!
-All operations completed A-OK!
-Comparing user usage
-Comparing group usage
-=== FSX Standard Mode, Memory Mapping, 1 Tasks ===
-All operations completed A-OK!
-Comparing user usage
-Comparing group usage
+4c4
+< #1001 -- 524 0 0 3 0 0
+---
+> #1001 -- 316 0 0 3 0 0
generic/270 and generic/233 give a similar mismatch when they don't
hang.
So, yeah, we haven't been verifying the quota accounting code as
well as we should have been for some time now....
Cheers,
Dave.
I did see the the hang some times and the accounting mismatch. Dave do
you want to look into this further. Otherwise I am OK with approving
this patch and fixing the accounting and lockup under another bug
because this patch is the way to work around the remount issue. I will
leave it up to you.
Reviewed-by: Rich Johnston <rjohnston@xxxxxxx>
_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs