On 2024/11/17 1:59, Ojaswin Mujoo wrote:
On Sat, Nov 16, 2024 at 02:20:26AM +0530, Ritesh Harjani wrote:
Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> writes:
One of the paths quota writeback is called from is:
freeze_super()
sync_filesystem()
ext4_sync_fs()
dquot_writeback_dquots()
Since we currently don't always flush the quota_release_work queue in
this path, we can end up with the following race:
1. dquot are added to releasing_dquots list during regular operations.
2. FS freeze starts, however, this does not flush the quota_release_work queue.
3. Freeze completes.
4. Kernel eventually tries to flush the workqueue while FS is frozen which
hits a WARN_ON since transaction gets started during frozen state:
ext4_journal_check_start+0x28/0x110 [ext4] (unreliable)
__ext4_journal_start_sb+0x64/0x1c0 [ext4]
ext4_release_dquot+0x90/0x1d0 [ext4]
quota_release_workfn+0x43c/0x4d0
Which is the following line:
WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
Which ultimately results in generic/390 failing due to dmesg
noise. This was detected on powerpc machine 15 cores.
To avoid this, make sure to flush the workqueue during
dquot_writeback_dquots() so we dont have any pending workitems after
freeze.
Not just that, sync_filesystem can also be called from other places and
quota_release_workfn() could write out and and release the dquot
structures if such are found during processing of releasing_dquots list.
IIUC, this was earlier done in the same dqput() context but had races
with dquot_mark_dquot_dirty(). Hence the final dqput() will now add the
dquot structures to releasing_dquots list and will schedule a delayed
workfn which will process the releasing_dquots list.
Hi Ritesh,
Ohh right, thanks for the context. I see this was done here:
dabc8b207566 quota: fix dqput() to follow the guarantees dquot_srcu
should provide
Which went in v6.5. Let me cc Baokun as well.
Hello Ojaswin,
Nice catch! Thanks for fixing this up!
Have you tested the performance impact of this patch? It looks like the
unconditional call to flush_delayed_work() in dquot_writeback_dquots()
may have some performance impact for frequent chown/sync scenarios.
When calling release_dquot(), we will only remove the quota of an object
(user/group/project) from disk if it is not quota-limited and does not
use any inode or block.
Asynchronous removal is now much more performance friendly, not only does
it make full use of the multi-core, but for scenarios where we have to
repeatedly chown between two objects, delayed release avoids the need to
repeatedly allocate/free space in memory and on disk.
Overall, since the actual dirty data is already on the disk, there is no
consistency issue here as it is just clearing unreferenced quota on the
disk, so I thought maybe it would be better to call flush_delayed_work()
in the freeze context.
Thanks,
Baokun
And so after the final dqput and before the release_workfn gets
scheduled, if dquot gets marked as dirty or dquot_transfer gets called -
then I am suspecting that it could lead to a dirty or an active dquot.
Hence, flushing the delayed quota_release_work at the end of
dquot_writeback_dquots() looks like the right thing to do IMO.
But I can give another look as this part of the code is not that well
known to me.
Reported-by: Disha Goel <disgoel@xxxxxxxxxxxxx>
Signed-off-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx>
---
Maybe a fixes tag as well?
Right, I'll add that in the next revision. I believe it would be:
Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide")
Regards,
ojaswin
fs/quota/dquot.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 3dd8d6f27725..2782cfc8c302 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -729,6 +729,8 @@ int dquot_writeback_dquots(struct super_block *sb, int type)
sb->dq_op->write_info(sb, cnt);
dqstats_inc(DQST_SYNCS);
+ flush_delayed_work("a_release_work);
+
return ret;
}
EXPORT_SYMBOL(dquot_writeback_dquots);
--
2.43.5