On Mon 25-09-17 15:59:46, Jan Kara wrote: > On Thu 21-09-17 11:48:46, Eric Whitney wrote: > > I'm seeing generic/232 fail from time to time when running a 4.14-rc1 kernel > > on xfstest-bld's most recent kvm-xfstests test appliance. In one set of > > trials, it failed in the same manner 4 out of 10 times when running the 4k test > > configuration for ext4. > > > > The failure bisects to "quota: Do not acquire dqio_sem for dquot overwrites in > > v2 format" (ab2b86360f6e). When this patch was reverted in a 4.14-rc1 kernel, > > the failure did not reoccur in a series of 20 trials. > > Thanks for debugging this! I'd just note that the commit hash of that > change is different for me - d2faa415166b2883428efa92f451774ef44373ac. > > > Example output from the failed test: > > > > QA output created by 232 > > > > Testing fsstress > > > > seed = S > > Comparing user usage > > 218a219 > > > #3740 -- 4 0 0 1 0 0 > > 245a247 > > > #45 -- 0 0 0 1 0 0 > > > > Note: I'm also seeing a similar failure for generic/233, but the patch > > containing the root cause likely comes somewhere after ab2b86360f6e. I'll post > > another bug report once I locate it. > > I'll try to debug this further. Thanks for report! Attached patch fixes the problem for me. I'll merge it through my tree. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR
>From a0ae41c2a9c204374eafd24a928e4352841bd905 Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@xxxxxxx> Date: Tue, 26 Sep 2017 10:36:05 +0200 Subject: [PATCH] quota: Fix quota corruption with generic/232 test Eric has reported that since commit d2faa415166b "quota: Do not acquire dqio_sem for dquot overwrites in v2 format" test generic/232 occasionally fails due to quota information being incorrect. Indeed that commit was too eager to remove dqio_sem completely from the path that just overwrites quota structure with updated information. Although that is innocent on its own, another process that inserts new quota structure to the same block can perform read-modify-write cycle of that block thus effectively discarding quota information update if they race in a wrong way. Fix the problem by acquiring dqio_sem for reading for overwrites of quota structure. Note that it *is* possible to completely avoid taking dqio_sem in the overwrite path however that will require modifying path inserting / deleting quota structures to avoid RMW cycles of the full block and for now it is not clear whether it is worth the hassle. Fixes: d2faa415166b2883428efa92f451774ef44373ac Reported-by: Eric Whitney <enwlinux@xxxxxxxxx> Signed-off-by: Jan Kara <jack@xxxxxxx> --- fs/quota/quota_v2.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/quota/quota_v2.c b/fs/quota/quota_v2.c index c0187cda2c1e..a73e5b34db41 100644 --- a/fs/quota/quota_v2.c +++ b/fs/quota/quota_v2.c @@ -328,12 +328,16 @@ static int v2_write_dquot(struct dquot *dquot) if (!dquot->dq_off) { alloc = true; down_write(&dqopt->dqio_sem); + } else { + down_read(&dqopt->dqio_sem); } ret = qtree_write_dquot( sb_dqinfo(dquot->dq_sb, dquot->dq_id.type)->dqi_priv, dquot); if (alloc) up_write(&dqopt->dqio_sem); + else + up_read(&dqopt->dqio_sem); return ret; } -- 2.12.3