On Mon, Nov 18, 2024 at 02:15:12PM +0100, Jan Kara wrote: > On Sat 16-11-24 00:04:49, Ojaswin Mujoo wrote: > > One of the paths quota writeback is called from is: > > > > freeze_super() > > sync_filesystem() > > ext4_sync_fs() > > dquot_writeback_dquots() > > > > Since we currently don't always flush the quota_release_work queue in > > this path, we can end up with the following race: > > > > 1. dquot are added to releasing_dquots list during regular operations. > > 2. FS freeze starts, however, this does not flush the quota_release_work queue. > > 3. Freeze completes. > > 4. Kernel eventually tries to flush the workqueue while FS is frozen which > > hits a WARN_ON since transaction gets started during frozen state: > > > > ext4_journal_check_start+0x28/0x110 [ext4] (unreliable) > > __ext4_journal_start_sb+0x64/0x1c0 [ext4] > > ext4_release_dquot+0x90/0x1d0 [ext4] > > quota_release_workfn+0x43c/0x4d0 > > > > Which is the following line: > > > > WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE); > > > > Which ultimately results in generic/390 failing due to dmesg > > noise. This was detected on powerpc machine 15 cores. > > > > To avoid this, make sure to flush the workqueue during > > dquot_writeback_dquots() so we dont have any pending workitems after > > freeze. > > > > Reported-by: Disha Goel <disgoel@xxxxxxxxxxxxx> > > Signed-off-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> > > Thanks for debugging this! > > > diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c > > index 3dd8d6f27725..2782cfc8c302 100644 > > --- a/fs/quota/dquot.c > > +++ b/fs/quota/dquot.c > > @@ -729,6 +729,8 @@ int dquot_writeback_dquots(struct super_block *sb, int type) > > sb->dq_op->write_info(sb, cnt); > > dqstats_inc(DQST_SYNCS); > > > > + flush_delayed_work("a_release_work); > > + > > I'd rather do this at the start of dquot_writeback_dquots(). Chances are > this saves some retry loops in the dirty list iterations. That being said I Hi Jan, thanks for review :) > don't think this is enough as I'm thinking about it. iput() can be called > anytime while the filesystem is frozen (just freeze the filesystem and do > echo 3 >/proc/sys/vm/drop_caches) which will consequently call dquot_drop() > -> dqput(). This should not be really freeing the dquot on-disk structure > (the inode itself is still accounted there) but nevertheless it may end up > dropping the last dquot in-memory reference and ext4_release_dquot() will > call ext4_journal_start() and complain. So I think on top of this patch > which makes sense on its own and deals with 99.9% of cases, we also need > ext4 specific fix which uses sb_start_intwrite() to get freeze protection > in ext4_release_dquot() (and in principle we always needed this, delayed > dquot releasing does not influence this particular problem). Some care will > be needed if the transaction is already started when ext4_release_dquot() > is called - you can take inspiration in how ext4_evict_inode() handles > this. That's a good point Jan, this could indeed happen if we drop caches destroying an inode pinned in the lru cache. Thanks for the pointers, I'll try to look into hardening ext4_release_dquot() as you suggested and send a v2. Regards, ojaswin > > Honza > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR