Hi, Ted and Jan, what do you think about this solution ? Thanks, Yi. On 2020/6/20 10:54, zhangyi (F) wrote: > Changes since v2: > - Christoph against the solution of adding callback in the block layer > that could let ext4 handle write error. So for simplicity, switch to > check the bdev mapping->wb_err when ext4 getting journal write access > as Jan suggested now. Maybe we could implement the callback through > introduce a special inode (e.g. a meta inode) for ext4 in the future. > - Patch 1: Add mapping->wb_err check and invoke ext4_error_err() in > ext4_journal_get_write_access() if wb_err is different from the > original one saved at mount time. > - Patch 2-3: Remove partial fix <7963e5ac90125> and <9c83a923c67d>. > - Patch 4: Fix another inconsistency problem since we may bypass the > journal's checkpoint procedure if we free metadata buffers which > were failed to async write out. > - Patch 5: Just a cleanup patch. > > The above 5 patches are based on linux-5.8-rc1 and have been tested by > xfstests, no newly increased failures. > > Thanks, > Yi. > > ----------------------- > > Original background > =================== > > This patch set point to fix the inconsistency problem which has been > discussed and partial fixed in [1]. > > Now, the problem is on the unstable storage which has a flaky transport > (e.g. iSCSI transport may disconnect few seconds and reconnect due to > the bad network environment), if we failed to async write metadata in > background, the end write routine in block layer will clear the buffer's > uptodate flag, but the data in such buffer is actually uptodate. Finally > we may read "old && inconsistent" metadata from the disk when we get the > buffer later because not only the uptodate flag was cleared but also we > do not check the write io error flag, or even worse the buffer has been > freed due to memory presure. > > Fortunately, if the jbd2 do checkpoint after async IO error happens, > the checkpoint routine will check the write_io_error flag and abort the > the journal if detect IO error. And in the journal recover case, the > recover code will invoke sync_blockdev() after recover complete, it will > also detect IO error and refuse to mount the filesystem. > > Current ext4 have already deal with this problem in __ext4_get_inode_loc() > and commit 7963e5ac90125 ("ext4: treat buffers with write errors as > containing valid data"), but it's not enough. > > [1] https://lore.kernel.org/linux-ext4/20190823030207.GC8130@xxxxxxx/ > > > zhangyi (F) (5): > ext4: abort the filesystem if failed to async write metadata buffer > ext4: remove ext4_buffer_uptodate() > ext4: remove write io error check before read inode block > jbd2: abort journal if free a async write error metadata buffer > jbd2: remove unused parameter in jbd2_journal_try_to_free_buffers() > > fs/ext4/ext4.h | 16 +++------------- > fs/ext4/ext4_jbd2.c | 25 +++++++++++++++++++++++++ > fs/ext4/inode.c | 15 +++------------ > fs/ext4/super.c | 23 ++++++++++++++++++++--- > fs/jbd2/transaction.c | 20 ++++++++++++++------ > include/linux/jbd2.h | 2 +- > 6 files changed, 66 insertions(+), 35 deletions(-) >