On Fri 09-09-22 14:27:36, Zhihao Cheng wrote: > Following process may lead to fs corruption: > 1. ext4_create(dir/foo) > ext4_add_nondir > ext4_add_entry > ext4_dx_add_entry > a. add_dirent_to_buf > ext4_mark_inode_dirty > ext4_handle_dirty_metadata // dir inode bh is recorded into journal > b. ext4_append // dx_get_count(entries) == dx_get_limit(entries) > ext4_bread(EXT4_GET_BLOCKS_CREATE) > ext4_getblk > ext4_map_blocks > ext4_ext_map_blocks > ext4_mb_new_blocks > dquot_alloc_block > dquot_alloc_space_nodirty > inode_add_bytes // update dir's i_blocks > ext4_ext_insert_extent > ext4_ext_dirty // record extent bh into journal > ext4_handle_dirty_metadata(bh) > // record new block into journal > inode->i_size += inode->i_sb->s_blocksize // new size(in mem) > c. ext4_handle_dirty_dx_node(bh2) > // record dir's new block(dx_node) into journal > d. ext4_handle_dirty_dx_node((frame - 1)->bh) > e. ext4_handle_dirty_dx_node(frame->bh) > f. do_split // ret err! > g. add_dirent_to_buf > ext4_mark_inode_dirty(dir) // update raw_inode on disk(skipped) > 2. fsck -a /dev/sdb > drop last block(dx_node) which beyonds dir's i_size. > /dev/sdb: recovering journal > /dev/sdb contains a file system with errors, check forced. > /dev/sdb: Inode 12, end of extent exceeds allowed value > (logical block 128, physical block 3938, len 1) > 3. fsck -fn /dev/sdb > dx_node->entry[i].blk > dir->i_size > Pass 2: Checking directory structure > Problem in HTREE directory inode 12 (/dir): bad block number 128. > Clear HTree index? no > Problem in HTREE directory inode 12: block #3 has invalid depth (2) > Problem in HTREE directory inode 12: block #3 has bad max hash > Problem in HTREE directory inode 12: block #3 not referenced > > Just like make_indexed_dir() does, update dir inode if error occurs. > Fetch a reproducer in [Link]. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216466 > CC: stable@xxxxxxxxxxxxxxx > Signed-off-by: Zhihao Cheng <chengzhihao1@xxxxxxxxxx> Thanks for the analysis and the fix! In principle it looks fine but overall we seem to be defering directory dirtying too much in ext4 directory handling code and as a result things are too fragile as you've noticed. By looking through fs/ext4/namei.c I've found several more places that have exactly the same problem as you're fixing here. I think that specifically for these problems with ext4_append() the best solution is to mark inode dirty directly inside ext4_append(). Sure it will result in more copying of inode data into the journal but growing directory size is not that performance critical operation so I think the code simplicity is worth the extra CPU cycles. Honza > --- > fs/ext4/namei.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > index 3a31b662f661..f04871fa4ead 100644 > --- a/fs/ext4/namei.c > +++ b/fs/ext4/namei.c > @@ -2617,6 +2617,13 @@ static int ext4_dx_add_entry(handle_t *handle, struct ext4_filename *fname, > */ > if (restart && err == 0) > goto again; > + /* > + * Even if the dx_add_entry failed, we have to properly write > + * out all the changes we did so far. Otherwise we can end up > + * with corrupted filesystem. > + */ > + if (err) > + ext4_mark_inode_dirty(handle, dir); > return err; > } > > -- > 2.31.1 > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR