From: Dave Chinner <dchinner@xxxxxxxxxx> If we fail delayed allocation conversion because we cannot allocate blocks, we end up in the situation where the inode extent list is effectively corrupt and unresolvable. Whilst we have dirty data in memory that we cannot allocate space for, we cannot write that data back to disk. Unmounting a filesystem in this state results in data loss. In situations where we end up with a corrupt extent list in memory, we can also be asked to convert a delayed region that does not have a delalloc extent backing it. This should be considered a corruption, too, not a "try again later" error. These conversion failures result in the inode being sick and needing repair, but we don't mark all the conditions that can lead to bmap sickness being marked. Make sure that the error conditions that indicate corruption are properly marked. Further, if we trip over these corruptions conditions, we then have to reclaim an inode that has unresolvable delayed allocation extents attached to the inode. This generally happens at unmount and inode inactivation will fire assert failures because we've left stray delayed allocation extents behind on the inode. Hence we need to ensure that we only trigger the stale delalloc extent checks if the inode is fully healthy. Even then, this may not be enough, because the inactivation code assumes that there will be no stray delayed extents unless the filesystem is shut down. If the inode is unhealthy, we need to ensure we clean up delayed allocation extents within EOF because the VFS has already tossed the data away. Hence there's no longer any data over the delalloc extents to write back, so we need to also toss the delayed allocation extents to ensure we release the space reservation the delalloc extent holds. Failure to punch delalloc extents in this case results in assert failures during unmount when the delalloc block counter is torn down. This all needs to be in place before the next patch which resolves a bug in the iomap code that discards delalloc extents backing dirty pages on writeback error without discarding the dirty data. Hence we need to be able to handle delalloc extents in inode cleanup sanely, rather than rely on incorrectly punching the delalloc extents on the first writeback error that occurs. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> --- fs/xfs/libxfs/xfs_bmap.c | 13 ++++++++++--- fs/xfs/xfs_icache.c | 4 +++- fs/xfs/xfs_inode.c | 10 ++++++++++ 3 files changed, 23 insertions(+), 4 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 958e4bb2e51e..fb718a5825d5 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4553,8 +4553,12 @@ xfs_bmapi_convert_delalloc( * should only happen for the COW fork, where another thread * might have moved the extent to the data fork in the meantime. */ - WARN_ON_ONCE(whichfork != XFS_COW_FORK); - error = -EAGAIN; + if (whichfork != XFS_COW_FORK) { + xfs_bmap_mark_sick(ip, whichfork); + error = -EFSCORRUPTED; + } else { + error = -EAGAIN; + } goto out_trans_cancel; } @@ -4598,8 +4602,11 @@ xfs_bmapi_convert_delalloc( bma.prev.br_startoff = NULLFILEOFF; error = xfs_bmapi_allocate(&bma); - if (error) + if (error) { + if ((error == -EFSCORRUPTED) || (error == -EFSBADCRC)) + xfs_bmap_mark_sick(ip, whichfork); goto out_finish; + } error = -ENOSPC; if (WARN_ON_ONCE(bma.blkno == NULLFSBLOCK)) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index ddeaccc04aec..4354b6639dec 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -24,6 +24,7 @@ #include "xfs_ialloc.h" #include "xfs_ag.h" #include "xfs_log_priv.h" +#include "xfs_health.h" #include <linux/iversion.h> @@ -1810,7 +1811,8 @@ xfs_inodegc_set_reclaimable( struct xfs_mount *mp = ip->i_mount; struct xfs_perag *pag; - if (!xfs_is_shutdown(mp) && ip->i_delayed_blks) { + if (ip->i_delayed_blks && xfs_inode_is_healthy(ip) && + !xfs_is_shutdown(mp)) { xfs_check_delalloc(ip, XFS_DATA_FORK); xfs_check_delalloc(ip, XFS_COW_FORK); ASSERT(0); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index d354ea2b74f9..06f1229ef628 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -37,6 +37,7 @@ #include "xfs_reflink.h" #include "xfs_ag.h" #include "xfs_log_priv.h" +#include "xfs_health.h" struct kmem_cache *xfs_inode_cache; @@ -1738,6 +1739,15 @@ xfs_inactive( if (xfs_can_free_eofblocks(ip, true)) xfs_free_eofblocks(ip); + /* + * If the inode is sick, then it might have delalloc extents + * within EOF that we were unable to convert. We have to punch + * them out here to release the reservation as there is no + * longer any data to write back into the delalloc range now. + */ + if (!xfs_inode_is_healthy(ip)) + xfs_bmap_punch_delalloc_range(ip, 0, + i_size_read(VFS_I(ip))); goto out; } -- 2.39.0