On Tue 06-07-21 22:40:46, Zhang Yi wrote: > On 2021/7/6 20:11, Jan Kara wrote: > > On Tue 06-07-21 10:42:07, Zhang Yi wrote: > >> After commit 3da40c7b0898 ("ext4: only call ext4_truncate when size <= > >> isize"), i_disksize could always be updated to i_size in ext4_setattr(), > >> and it seems that there is no other way that could appear > >> i_disksize < i_size besides the delalloc write. In the case of delay > > > > Well, there are also direct IO writes which have temporarily i_disksize < > > i_size but when you hold i_rwsem, you're right that delalloc is the only > > reason why you can see i_disksize < i_size AFAIK. > > > >> alloc write, ext4_writepages() could update i_disksize for the new delay > >> allocated blocks properly. So we could switch to check i_size instead > >> of i_disksize in ext4_da_write_end() when write to the end of the file. > > > > I agree that since ext4_da_should_update_i_disksize() needs to return true > > for us to touch i_disksize, writeback has to have already allocated block > > underlying the end of write (new_i_size position) and thus we are > > guaranteed that writeback will also soon update i_disksize after the > > new_i_size position. So I agree that your switch to testing i_size instead > > of i_disksize should not have any bad effect... Thinking about this some > > more why do we need i_disksize update in ext4_da_write_end() at all? The > > page will be dirtied and when writeback will happen we will update > > i_disksize to i_size. Updating i_disksize earlier brings no benefit - the user > > will see zeros instead of valid data if we crash before the writeback > > happened. Am I missing something guys? > > > > Hi, Jan. > > Do you remember the patch and question I asked 2 years ago[1][2]? The > case of new_i_size > i_size && ext4_da_should_update_i_disksize() here > means partial block append write, Agreed. > ext4_writepages() does not update i_disksize for this case now. Doesn't it? Hmm, so mpage_map_and_submit_extent() certainly does make sure we update i_size properly. But you are actually correct that ext4_writepage() does not update i_disksize and neither does mpage_prepare_extent_to_map() which can also writeback fully mapped pages. Changing mpage_prepare_extent_to_map() to handle i_disksize update would be trivial but dealing with ext4_writepage() would be difficult. So yes, let's keep the i_disksize update in ext4_da_write_end() for now. But please add a comment there explaining the situation. Like: /* * Since we are holding inode lock, we are sure i_disksize <= * i_size. We also know that if i_disksize < i_size, there are * delalloc writes pending in the range upto i_size. If the end of * the current write is <= i_size, there's no need to touch * i_disksize since writeback will push i_disksize upto i_size * eventually. If the end of the current write is > i_size and * inside an allocated block (ext4_da_should_update_i_disksize() * check), we need to update i_disksize here as neither * ext4_writepage() nor certain ext4_writepages() paths not * allocating blocks update i_disksize. * * Note that we defer inode dirtying to generic_write_end() / * ext4_da_write_inline_data_end(). */ > And the journal data=ordered mode also > cannot guarantee write data before metadata. So we cannot guarantee we > cannot see zeros where data was written after crash. Yes, but that is IMO somewhat different question. Honza > > Thanks, > Yi. > > [1]https://lore.kernel.org/linux-ext4/20190404101823.GA22313@xxxxxxxxxxxxxx/ > [2]https://lore.kernel.org/linux-ext4/20190405091258.GA1600@xxxxxxxxxxxxxx/ > > > > >> we also could remove ext4_mark_inode_dirty() together because > >> generic_write_end() will dirty the inode. > >> > >> Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx> > >> --- > >> fs/ext4/inode.c | 21 ++++++++------------- > >> 1 file changed, 8 insertions(+), 13 deletions(-) > >> > >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > >> index d8de607849df..6f6a61f3ae5f 100644 > >> --- a/fs/ext4/inode.c > >> +++ b/fs/ext4/inode.c > >> @@ -3087,32 +3087,27 @@ static int ext4_da_write_end(struct file *file, > >> * generic_write_end() will run mark_inode_dirty() if i_size > >> * changes. So let's piggyback the i_disksize mark_inode_dirty > >> * into that. > >> + * > >> + * Check i_size not i_disksize here because ext4_writepages() could > >> + * update i_disksize from i_size for delay allocated blocks properly. > >> */ > >> new_i_size = pos + copied; > >> - if (copied && new_i_size > EXT4_I(inode)->i_disksize) { > >> + if (copied && new_i_size > inode->i_size) { > >> if (ext4_has_inline_data(inode) || > >> - ext4_da_should_update_i_disksize(page, end)) { > >> + ext4_da_should_update_i_disksize(page, end)) > >> ext4_update_i_disksize(inode, new_i_size); > >> - /* We need to mark inode dirty even if > >> - * new_i_size is less that inode->i_size > >> - * bu greater than i_disksize.(hint delalloc) > >> - */ > >> - ret = ext4_mark_inode_dirty(handle, inode); > >> - } > >> } > >> > >> if (write_mode != CONVERT_INLINE_DATA && > >> ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA) && > >> ext4_has_inline_data(inode)) > >> - ret2 = ext4_da_write_inline_data_end(inode, pos, len, copied, > >> + ret = ext4_da_write_inline_data_end(inode, pos, len, copied, > >> page); > >> else > >> - ret2 = generic_write_end(file, mapping, pos, len, copied, > >> + ret = generic_write_end(file, mapping, pos, len, copied, > >> page, fsdata); > >> > >> - copied = ret2; > >> - if (ret2 < 0) > >> - ret = ret2; > >> + copied = ret; > >> ret2 = ext4_journal_stop(handle); > >> if (unlikely(ret2 && !ret)) > >> ret = ret2; > >> -- > >> 2.31.1 > >> -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR