On Tue 10-08-21 22:27:22, Zhang Yi wrote: > In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate > inode buffer when the inode monopolize an inode block for performance > reason. For most cases, ext4_mark_iloc_dirty() will fill the inode > buffer to make it fine, but we could miss this call if something bad > happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an > empty inode buffer and trigger ext4 error. > > For example, if we remove a nonexistent xattr on inode A, > ext4_xattr_set_handle() will return ENODATA before invoking > ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We > will get checksum error message in ext4_iget() when getting inode again. > > EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid > > Even worse, if we allocate another inode B at the same inode block, it > will corrupt the inode A on disk when write back inode B. > > So this patch clear uptodate flag and mark buffer new if we get an empty > buffer, clear it after we fill inode data or making read IO. > > Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx> Thanks for the fix! Really good catch! The patch looks correct but honestly, I'm not very happy about the special buffer_new handling. It looks correct but I'm a bit uneasy that e.g. the block device code can access this buffer and manipulate its state. Cannot we instead e.g. check whether the buffer is uptodate in ext4_mark_iloc_dirty(), if not, lock it, if still not uptodate, zero it, mark as uptodate, unlock it and then call ext4_do_update_inode()? That would seem like a bit more foolproof solution to me. Basically the fact that the buffer is not uptodate in ext4_mark_iloc_dirty() would mean that nobody else is past __ext4_get_inode_loc() for another inode in that buffer and so zeroing is safe. Honza > --- > fs/ext4/inode.c | 27 +++++++++++++++++++++++---- > 1 file changed, 23 insertions(+), 4 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index eae1b2d0b550..1f36289e9ef6 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -4292,6 +4292,18 @@ int ext4_truncate(struct inode *inode) > return err; > } > > +static void ext4_end_inode_loc_read(struct buffer_head *bh, int uptodate) > +{ > + if (buffer_new(bh)) > + clear_buffer_new(bh); > + if (uptodate) > + set_buffer_uptodate(bh); > + else > + clear_buffer_uptodate(bh); > + unlock_buffer(bh); > + put_bh(bh); > +} > + > /* > * ext4_get_inode_loc returns with an extra refcount against the inode's > * underlying buffer_head on success. If 'in_mem' is true, we have all > @@ -4367,9 +4379,11 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino, > } > brelse(bitmap_bh); > if (i == start + inodes_per_block) { > - /* all other inodes are free, so skip I/O */ > - memset(bh->b_data, 0, bh->b_size); > - set_buffer_uptodate(bh); > + if (!buffer_new(bh)) { > + /* all other inodes are free, so skip I/O */ > + memset(bh->b_data, 0, bh->b_size); > + set_buffer_new(bh); > + } > unlock_buffer(bh); > goto has_buffer; > } > @@ -4408,7 +4422,7 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino, > * Read the block from disk. > */ > trace_ext4_load_inode(sb, ino); > - ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, NULL); > + ext4_read_bh_nowait(bh, REQ_META | REQ_PRIO, ext4_end_inode_loc_read); > blk_finish_plug(&plug); > wait_on_buffer(bh); > ext4_simulate_fail_bh(sb, bh, EXT4_SIM_INODE_EIO); > @@ -5132,6 +5146,11 @@ static int ext4_do_update_inode(handle_t *handle, > if (err) > goto out_brelse; > ext4_clear_inode_state(inode, EXT4_STATE_NEW); > + if (buffer_new(bh)) { > + clear_buffer_new(bh); > + set_buffer_uptodate(bh); > + } > + > if (set_large_file) { > BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "get write access"); > err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh); > -- > 2.31.1 > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR