[PATCH 4/6] xfs: don't screw up direct writes when freesp is fragmented

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Thu, 25 Jan 2018 18:05:12 -0800

From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>

xfs_bmap_btalloc is given a range of file offset blocks that must be
allocated to some data/attr/cow fork.  If the fork has an extent size
hint associated with it, the request will be enlarged on both ends to
try to satisfy the alignment hint.  If free space is fragmentated,
sometimes we can allocate some blocks but not enough to fulfill any of
the requested range.  Since bmapi_allocate always trims the new extent
mapping to match the originally requested range, this results in
bmapi_write returning zero and no mapping.

The consequences of this vary -- buffered writes will simply re-call
bmapi_write until it can satisfy at least one block from the original
request.  Direct IO overwrites notice nmaps == 0 and return -ENOSPC
through the dio mechanism out to userspace with the weird result that
writes fail even when we have enough space because the ENOSPC return
overrides any partial write status.  For direct CoW writes the situation
is disastrous because nobody notices us returning an invalid zero-length
wrong-offset mapping to iomap and the write goes off into space.

First of all, teach iomap and xfs_reflink_allocate_cow to check the
mappings being returned and bail out with ENOSPC if we can't get any
blocks.  Second of all, if free space is so fragmented we can't satisfy
even a single block of the original allocation request but did get a few
blocks, we should break the alignment hint in order to guarantee at
least some forward progress for the direct write.  If we return a short
allocation to iomap_apply it'll call back about the remaining blocks.

Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
---
 fs/iomap.c               |    2 +-
 fs/xfs/libxfs/xfs_bmap.c |   15 +++++++++++++++
 fs/xfs/xfs_reflink.c     |    6 ++++++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index e5de772..aec35a0 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -63,7 +63,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 	ret = ops->iomap_begin(inode, pos, length, flags, &iomap);
 	if (ret)
 		return ret;
-	if (WARN_ON(iomap.offset > pos))
+	if (WARN_ON(iomap.offset > pos) || WARN_ON(iomap.length == 0))
 		return -EIO;
 
 	/*
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 032befb..1fb885c 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3386,6 +3386,8 @@ xfs_bmap_btalloc(
 	xfs_agnumber_t	fb_agno;	/* ag number of ap->firstblock */
 	xfs_agnumber_t	ag;
 	xfs_alloc_arg_t	args;
+	xfs_fileoff_t	orig_offset;
+	xfs_extlen_t	orig_length;
 	xfs_extlen_t	blen;
 	xfs_extlen_t	nextminlen = 0;
 	int		nullfb;		/* true if ap->firstblock isn't set */
@@ -3395,6 +3397,8 @@ xfs_bmap_btalloc(
 	int		stripe_align;
 
 	ASSERT(ap->length);
+	orig_offset = ap->offset;
+	orig_length = ap->length;
 
 	mp = ap->ip->i_mount;
 
@@ -3610,6 +3614,17 @@ xfs_bmap_btalloc(
 			*ap->firstblock = args.fsbno;
 		ASSERT(nullfb || fb_agno <= args.agno);
 		ap->length = args.len;
+		/*
+		 * If we didn't get enough blocks to fill even one block of
+		 * the original request, break the alignment and return
+		 * whatever we got; it's the best we can do.  Free space is
+		 * fragmented enough that we cannot honor the extent size
+		 * hints but we can still make some forward progress.
+		 */
+		if (ap->length <= orig_length)
+			ap->offset = orig_offset;
+		else if (ap->offset + ap->length < orig_offset + orig_length)
+			ap->offset = orig_offset + orig_length - ap->length;
 		xfs_bmap_btalloc_accounting(ap, &args);
 	} else {
 		ap->blkno = NULLFSBLOCK;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index 9a6c545..9eca8aa 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -464,6 +464,12 @@ xfs_reflink_allocate_cow(
 	error = xfs_trans_commit(tp);
 	if (error)
 		return error;
+	/*
+	 * Allocation succeeded but our request was not even partially
+	 * satisfied?  Bail out.
+	 */
+	if (nimaps == 0)
+		return -ENOSPC;
 convert:
 	return xfs_reflink_convert_cow_extent(ip, imap, offset_fsb, count_fsb,
 			&dfops);

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html