On Tue 10-10-17 13:43:23, Jan Kara wrote: > Hi Eryu, > > On Sun 08-10-17 13:42:36, Eryu Guan wrote: > > After generic/232 failure has been reported and resolved[1], I still > > could see fstests generic/233 failure on ext4 with v4.14-rc3 kernel. > > This is not 100% reproduced (block usage needs to exceed soft limit) but > > reliably. > > > > seed = S > > Comparing user usage > > -Comparing group usage > > +4c4 > > +< #1001 +- 32064 32000 32000 998 1000 1000 > > +--- > > +> #1001 +- 32064 32000 32000 7days 998 1000 1000 > > > > Grace time was not printed by repquota right after the fsstress run when > > we exceeded the block soft limit, and only printed after a quotacheck > > was run. With v4.13 kernel, block grace time could be printed > > immediately after the fsstress run. > > Well, I'd rather interpret the results as "the grace time didn't get set by > the failing kernel, only quotacheck would set it". This configuration with > softlimit == hardlimit is a bit ambiguous (as effectively softlimit and > grace time are unused) and I might have shortcut setting of grace time in > this case somewhere (which would be harmless). But still it warrants closer > investigation. I'll have a look. > > > git bisect pointed the first bad to commit 7b9ca4c61bc2 ("quota: Reduce > > contention on dq_data_lock"). And I've confirmed the bisection result by > > converting the commit in question and running generic/233 for 20 > > iterations without a failure. > > Thanks for digging into this! OK, I've reproduced the issue (although it took me several xfstests run to hit this) and it is a real bug in handling of DQUOT_ALLOC_NOFAIL quota allocations. I'll send a fix shortly once testing completes. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR