On Tue 28-03-23 11:48:53, Baokun Li wrote: > We got a WARNING in ext4_add_complete_io: > ================================================================== > WARNING: at fs/ext4/page-io.c:231 ext4_put_io_end_defer+0x182/0x250 > CPU: 10 PID: 77 Comm: ksoftirqd/10 Tainted: 6.3.0-rc2 #85 > RIP: 0010:ext4_put_io_end_defer+0x182/0x250 [ext4] > [...] > Call Trace: > <TASK> > ext4_end_bio+0xa8/0x240 [ext4] > bio_endio+0x195/0x310 > blk_update_request+0x184/0x770 > scsi_end_request+0x2f/0x240 > scsi_io_completion+0x75/0x450 > scsi_finish_command+0xef/0x160 > scsi_complete+0xa3/0x180 > blk_complete_reqs+0x60/0x80 > blk_done_softirq+0x25/0x40 > __do_softirq+0x119/0x4c8 > run_ksoftirqd+0x42/0x70 > smpboot_thread_fn+0x136/0x3c0 > kthread+0x140/0x1a0 > ret_from_fork+0x2c/0x50 > ================================================================== > > Above issue may happen as follows: > > cpu1 cpu2 > ----------------------------|---------------------------- > mount -o dioread_lock > ext4_writepages > ext4_do_writepages > *if (ext4_should_dioread_nolock(inode))* > // rsv_blocks is not assigned here > mount -o remount,dioread_nolock > ext4_journal_start_with_reserve > __ext4_journal_start > __ext4_journal_start_sb > jbd2__journal_start > *if (rsv_blocks)* > // h_rsv_handle is not initialized here > mpage_map_and_submit_extent > mpage_map_one_extent > dioread_nolock = ext4_should_dioread_nolock(inode) > if (dioread_nolock && (map->m_flags & EXT4_MAP_UNWRITTEN)) > mpd->io_submit.io_end->handle = handle->h_rsv_handle > ext4_set_io_unwritten_flag > io_end->flag |= EXT4_IO_END_UNWRITTEN > // now io_end->handle is NULL but has EXT4_IO_END_UNWRITTEN flag > > scsi_finish_command > scsi_io_completion > scsi_io_completion_action > scsi_end_request > blk_update_request > req_bio_endio > bio_endio > bio->bi_end_io > ext4_end_bio > ext4_put_io_end_defer > ext4_add_complete_io > // trigger WARN_ON(!io_end->handle && sbi->s_journal); > > The immediate cause of this problem is that ext4_should_dioread_nolock() > function returns inconsistent values in the ext4_do_writepages() and > mpage_map_one_extent(). There are four conditions in this function that > can be changed at mount time to cause this problem. These four conditions > can be divided into two categories: > > (1) journal_data and EXT4_EXTENTS_FL, which can be changed by ioctl > (2) DELALLOC and DIOREAD_NOLOCK, which can be changed by remount > > The two in the first category have been fixed by commit c8585c6fcaf2 > ("ext4: fix races between changing inode journal mode and ext4_writepages") > and commit cb85f4d23f79 ("ext4: fix race between writepages and enabling > EXT4_EXTENTS_FL") respectively. > > Two cases in the other category have not yet been fixed, and the above > issue is caused by this situation. We refer to the fix for the first > category, when applying options during remount, we grab s_writepages_rwsem > to avoid racing with writepages ops to trigger this problem. > > Fixes: 6b523df4fb5a ("ext4: use transaction reservation for extent conversion in ext4_end_io") > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx> Looks good to me. Feel free to add: Reviewed-by: Jan Kara <jack@xxxxxxx> Honza > --- > V1->V2: > Grab s_writepages_rwsem unconditionally during remount. > Remove patches 1,2 that are no longer needed. > > fs/ext4/ext4.h | 3 ++- > fs/ext4/super.c | 9 +++++++++ > 2 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index 9b2cfc32cf78..5f5ee0c20673 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -1703,7 +1703,8 @@ struct ext4_sb_info { > > /* > * Barrier between writepages ops and changing any inode's JOURNAL_DATA > - * or EXTENTS flag. > + * or EXTENTS flag or between writepages ops and changing DELALLOC or > + * DIOREAD_NOLOCK mount options on remount. > */ > struct percpu_rw_semaphore s_writepages_rwsem; > struct dax_device *s_daxdev; > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index e6d84c1e34a4..607ebf2a008b 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -6403,7 +6403,16 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb) > > } > > + /* > + * Changing the DIOREAD_NOLOCK or DELALLOC mount options may cause > + * two calls to ext4_should_dioread_nolock() to return inconsistent > + * values, triggering WARN_ON in ext4_add_complete_io(). we grab > + * here s_writepages_rwsem to avoid race between writepages ops and > + * remount. > + */ > + percpu_down_write(&sbi->s_writepages_rwsem); > ext4_apply_options(fc, sb); > + percpu_up_write(&sbi->s_writepages_rwsem); > > if ((old_opts.s_mount_opt & EXT4_MOUNT_JOURNAL_CHECKSUM) ^ > test_opt(sb, JOURNAL_CHECKSUM)) { > -- > 2.31.1 > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR