Re: [PATCH] ext4: defer updating i_disksize until endio

Zhang Yi <yi.zhang@xxxxxxxxxx> · Mon, 27 Mar 2023 19:17:48 +0800

On 2023/3/27 18:28, Chung-Chiang Cheng wrote:
> On Mon, Mar 27, 2023 at 5:29 PM Jan Kara <jack@xxxxxxx> wrote:
>>
>> As Zhang Yi already noted in his review, this is expected at least with
>> data=writeback mount option. With data=ordered this should not happen
>> though as the commit of the transaction with i_disksize update will wait
>> for page writeback to complete (this is exactly the reason why data=ordered
>> exists after all). Are you able to observe this problem with data=ordered
>> mount option?
>>
>>                                                                 Honza
> 
> It's a pity that this issue also occurs with data=ordered due to delayed
> allocation being enabled by default. If delayed allocation were disabled,
> it would not be as easy to reproduce.
> 
> This is because if data is written to the end of a file and the block is
> allocated, the new i_disksize will be immediately committed to the journal
> at ext4_da_write_end(), but the writeback procedure is not yet triggered.
> By default, ext4 commits the journal every 5 seconds, but a dirty page may
> not be written back until 30 seconds later. This is not a short time window,
> and any improper shutdown during this time may lead to the issue :(
> 

It seems that the case you've mentioned is intra-block append write (no?),
current data=ordered mount option doesn't work in this case because
ext4_map_blocks() doesn't attach inode to the t_inode_list of the running
transaction. If delayed allocation were disabled, the lose data window is still
there, because ext4_write_end()->ext4_update_inode_size() is also updating
i_disksize before writing data back. This is at least guarantee no store data.
We had discussed this in [1].

[1]. https://lore.kernel.org/linux-ext4/1554370192-113254-1-git-send-email-yi.zhang@xxxxxxxxxx/

Thanks,
Yi.