[PATCH 2/3] xfs: failed delalloc conversion results in bad extent lists

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Dave Chinner <dchinner@xxxxxxxxxx>

If we fail delayed allocation conversion because we cannot allocate
blocks, we end up in the situation where the inode extent list is
effectively corrupt and unresolvable. Whilst we have dirty data in
memory that we cannot allocate space for, we cannot write that data
back to disk. Unmounting a filesystem in this state results in data
loss.

In situations where we end up with a corrupt extent list in memory,
we can also be asked to convert a delayed region that does not have
a delalloc extent backing it. This should be considered a
corruption, too, not a "try again later" error.

These conversion failures result in the inode being sick and needing
repair, but we don't mark all the conditions that can lead to bmap
sickness being marked. Make sure that the error conditions that
indicate corruption are properly marked.

Further, if we trip over these corruptions conditions, we then have
to reclaim an inode that has unresolvable delayed allocation extents
attached to the inode. This generally happens at unmount and inode
inactivation will fire assert failures because we've left stray
delayed allocation extents behind on the inode. Hence we need to
ensure that we only trigger the stale delalloc extent checks if the
inode is fully healthy.

Even then, this may not be enough, because the inactivation code
assumes that there will be no stray delayed extents unless the
filesystem is shut down. If the inode is unhealthy, we need to
ensure we clean up delayed allocation extents within EOF because
the VFS has already tossed the data away. Hence there's no longer
any data over the delalloc extents to write back, so we need to also
toss the delayed allocation extents to ensure we release the space
reservation the delalloc extent holds. Failure to punch delalloc
extents in this case results in assert failures during unmount when
the delalloc block counter is torn down.

This all needs to be in place before the next patch which resolves a
bug in the iomap code that discards delalloc extents backing dirty
pages on writeback error without discarding the dirty data. Hence we
need to be able to handle delalloc extents in inode cleanup sanely,
rather than rely on incorrectly punching the delalloc extents on the
first writeback error that occurs.

Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
---
 fs/xfs/libxfs/xfs_bmap.c | 13 ++++++++++---
 fs/xfs/xfs_icache.c      |  4 +++-
 fs/xfs/xfs_inode.c       | 10 ++++++++++
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 958e4bb2e51e..fb718a5825d5 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4553,8 +4553,12 @@ xfs_bmapi_convert_delalloc(
 		 * should only happen for the COW fork, where another thread
 		 * might have moved the extent to the data fork in the meantime.
 		 */
-		WARN_ON_ONCE(whichfork != XFS_COW_FORK);
-		error = -EAGAIN;
+		if (whichfork != XFS_COW_FORK) {
+			xfs_bmap_mark_sick(ip, whichfork);
+			error = -EFSCORRUPTED;
+		} else {
+			error = -EAGAIN;
+		}
 		goto out_trans_cancel;
 	}
 
@@ -4598,8 +4602,11 @@ xfs_bmapi_convert_delalloc(
 		bma.prev.br_startoff = NULLFILEOFF;
 
 	error = xfs_bmapi_allocate(&bma);
-	if (error)
+	if (error) {
+		if ((error == -EFSCORRUPTED) || (error == -EFSBADCRC))
+			xfs_bmap_mark_sick(ip, whichfork);
 		goto out_finish;
+	}
 
 	error = -ENOSPC;
 	if (WARN_ON_ONCE(bma.blkno == NULLFSBLOCK))
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index ddeaccc04aec..4354b6639dec 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -24,6 +24,7 @@
 #include "xfs_ialloc.h"
 #include "xfs_ag.h"
 #include "xfs_log_priv.h"
+#include "xfs_health.h"
 
 #include <linux/iversion.h>
 
@@ -1810,7 +1811,8 @@ xfs_inodegc_set_reclaimable(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct xfs_perag	*pag;
 
-	if (!xfs_is_shutdown(mp) && ip->i_delayed_blks) {
+	if (ip->i_delayed_blks && xfs_inode_is_healthy(ip) &&
+	    !xfs_is_shutdown(mp)) {
 		xfs_check_delalloc(ip, XFS_DATA_FORK);
 		xfs_check_delalloc(ip, XFS_COW_FORK);
 		ASSERT(0);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index d354ea2b74f9..06f1229ef628 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -37,6 +37,7 @@
 #include "xfs_reflink.h"
 #include "xfs_ag.h"
 #include "xfs_log_priv.h"
+#include "xfs_health.h"
 
 struct kmem_cache *xfs_inode_cache;
 
@@ -1738,6 +1739,15 @@ xfs_inactive(
 		if (xfs_can_free_eofblocks(ip, true))
 			xfs_free_eofblocks(ip);
 
+		/*
+		 * If the inode is sick, then it might have delalloc extents
+		 * within EOF that we were unable to convert. We have to punch
+		 * them out here to release the reservation as there is no
+		 * longer any data to write back into the delalloc range now.
+		 */
+		if (!xfs_inode_is_healthy(ip))
+			xfs_bmap_punch_delalloc_range(ip, 0,
+						i_size_read(VFS_I(ip)));
 		goto out;
 	}
 
-- 
2.39.0




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux