Re: [PATCH v3 0/5] ext4: fix inconsistency since async write metadata buffer error

"zhangyi (F)" <yi.zhang@xxxxxxxxxx> · Mon, 13 Jul 2020 09:40:47 +0800

Hi, Ted and Jan, what do you think about this solution ?

Thanks,
Yi.

On 2020/6/20 10:54, zhangyi (F) wrote:
> Changes since v2:
>  - Christoph against the solution of adding callback in the block layer
>    that could let ext4 handle write error. So for simplicity, switch to
>    check the bdev mapping->wb_err when ext4 getting journal write access
>    as Jan suggested now. Maybe we could implement the callback through
>    introduce a special inode (e.g. a meta inode) for ext4 in the future.
>  - Patch 1: Add mapping->wb_err check and invoke ext4_error_err() in
>    ext4_journal_get_write_access() if wb_err is different from the
>    original one saved at mount time.
>  - Patch 2-3: Remove partial fix <7963e5ac90125> and <9c83a923c67d>.
>  - Patch 4: Fix another inconsistency problem since we may bypass the
>    journal's checkpoint procedure if we free metadata buffers which
>    were failed to async write out.
>  - Patch 5: Just a cleanup patch.
>    
> The above 5 patches are based on linux-5.8-rc1 and have been tested by
> xfstests, no newly increased failures.
> 
> Thanks,
> Yi.
> 
> -----------------------
> 
> Original background
> ===================
> 
> This patch set point to fix the inconsistency problem which has been
> discussed and partial fixed in [1].
> 
> Now, the problem is on the unstable storage which has a flaky transport
> (e.g. iSCSI transport may disconnect few seconds and reconnect due to
> the bad network environment), if we failed to async write metadata in
> background, the end write routine in block layer will clear the buffer's
> uptodate flag, but the data in such buffer is actually uptodate. Finally
> we may read "old && inconsistent" metadata from the disk when we get the
> buffer later because not only the uptodate flag was cleared but also we
> do not check the write io error flag, or even worse the buffer has been
> freed due to memory presure.
> 
> Fortunately, if the jbd2 do checkpoint after async IO error happens,
> the checkpoint routine will check the write_io_error flag and abort the
> the journal if detect IO error. And in the journal recover case, the
> recover code will invoke sync_blockdev() after recover complete, it will
> also detect IO error and refuse to mount the filesystem.
> 
> Current ext4 have already deal with this problem in __ext4_get_inode_loc()
> and commit 7963e5ac90125 ("ext4: treat buffers with write errors as
> containing valid data"), but it's not enough.
> 
> [1] https://lore.kernel.org/linux-ext4/20190823030207.GC8130@xxxxxxx/
> 
> 
> zhangyi (F) (5):
>   ext4: abort the filesystem if failed to async write metadata buffer
>   ext4: remove ext4_buffer_uptodate()
>   ext4: remove write io error check before read inode block
>   jbd2: abort journal if free a async write error metadata buffer
>   jbd2: remove unused parameter in jbd2_journal_try_to_free_buffers()
> 
>  fs/ext4/ext4.h        | 16 +++-------------
>  fs/ext4/ext4_jbd2.c   | 25 +++++++++++++++++++++++++
>  fs/ext4/inode.c       | 15 +++------------
>  fs/ext4/super.c       | 23 ++++++++++++++++++++---
>  fs/jbd2/transaction.c | 20 ++++++++++++++------
>  include/linux/jbd2.h  |  2 +-
>  6 files changed, 66 insertions(+), 35 deletions(-)
>