Re: [PATCH] ext4: defer updating i_disksize until endio

Jan Kara <jack@xxxxxxx> · Mon, 27 Mar 2023 13:34:03 +0200

On Mon 27-03-23 18:28:55, Chung-Chiang Cheng wrote:
> On Mon, Mar 27, 2023 at 5:29 PM Jan Kara <jack@xxxxxxx> wrote:
> >
> > As Zhang Yi already noted in his review, this is expected at least with
> > data=writeback mount option. With data=ordered this should not happen
> > though as the commit of the transaction with i_disksize update will wait
> > for page writeback to complete (this is exactly the reason why data=ordered
> > exists after all). Are you able to observe this problem with data=ordered
> > mount option?
> >
> >                                                                 Honza
> 
> It's a pity that this issue also occurs with data=ordered due to delayed
> allocation being enabled by default. If delayed allocation were disabled,
> it would not be as easy to reproduce.

Ah, ok. With data=ordered and expanding within the last block, you are
right you can see zeros at the end of the file after a crash. We were
discussing this in the past already but decided not to improve this because
the fix would have performance cost we didn't want to impose on users.

> This is because if data is written to the end of a file and the block is
> allocated, the new i_disksize will be immediately committed to the journal
> at ext4_da_write_end(), but the writeback procedure is not yet triggered.
> By default, ext4 commits the journal every 5 seconds, but a dirty page may
> not be written back until 30 seconds later. This is not a short time window,
> and any improper shutdown during this time may lead to the issue :(

Yeah, I agree. The time window is not small. What we could do and what
could even bring some performance benefit is if we moved the i_disksize
update from ext4_da_write_end() to ext4_do_writepages(). Currently we do
the i_disksize update only in mpage_map_and_submit_extent() but we could
add a similar logic when exiting from ext4_do_writepages() to update
i_disksize for written back pages beyond i_disksize which didn't need block
allocation. *Except* there is a problem that we couldn't do this i_disksize
update when the pages are written from jbd2 during ordered data writeback
(we cannot start transaction in that context). And this is nasty because
we will completely loose the i_disksize update. We could handle it by
redirtying the tail page in this case but it gets a bit ugly...

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR