Re: [PATCH 2/7] fs: introduce iomap infrastructure

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 4 Apr 2016 11:28:49 +1000

On Mon, Mar 14, 2016 at 10:02:45PM +0100, Christoph Hellwig wrote:
> Add infrastructure for multipage buffered writes.  This is implemented
> using an main iterator that applies an actor function to a range that
> can be written.
> 
> This infrastucture is used to implement a buffered write helper, one
> to zero file ranges and one to implement the ->page_mkwrite VM
> operations.  All of them borrow a fair amount of code from fs/buffers.
> for now by using an internal version of __block_write_begin that
> gets passed an iomap and builds the corresponding buffer head.
> 
> The file system is gets a set of paired ->iomap_begin and ->iomap_end
> calls which allow it to map/reserve a range and get a notification
> once the write code is finished with it.
> 
> Based on earlier code from Dave Chinner.
> 
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
.....
> +/*
> + * Execute a iomap write on a segment of the mapping that spans a
> + * contiguous range of pages that have identical block mapping state.
> + *
> + * This avoids the need to map pages individually, do individual allocations
> + * for each page and most importantly avoid the need for filesystem specific
> + * locking per page. Instead, all the operations are amortised over the entire
> + * range of pages. It is assumed that the filesystems will lock whatever
> + * resources they require in the iomap_begin call, and release them in the
> + * iomap_end call.
> + */
> +static ssize_t
> +iomap_write_segment(struct inode *inode, loff_t pos, ssize_t length,
> +		unsigned flags, struct iomap_ops *ops, void *data,
> +		write_actor_t actor)

This requires external iteration to write the entire range required
if the allocation does not cover the entire length requested (i.e.
written < length).

Also, if the actor returns an error into written, that gets ignored
and the return status is whatever the ->iomap_end call returns.

....

> +int iomap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
> +		struct iomap_ops *ops)
> +{
> +	struct page *page = vmf->page;
> +	struct inode *inode = file_inode(vma->vm_file);
> +	unsigned long length;
> +	loff_t size;
> +	int ret;
> +
> +	lock_page(page);
> +	size = i_size_read(inode);
> +	if ((page->mapping != inode->i_mapping) ||
> +	    (page_offset(page) > size)) {
> +		/* We overload EFAULT to mean page got truncated */
> +		ret = -EFAULT;
> +		goto out_unlock;
> +	}
> +
> +	/* page is wholly or partially inside EOF */
> +	if (((page->index + 1) << PAGE_CACHE_SHIFT) > size)
> +		length = size & ~PAGE_CACHE_MASK;
> +	else
> +		length = PAGE_CACHE_SIZE;
> +
> +	ret = iomap_write_segment(inode, page_offset(page), length,
> +			IOMAP_ALLOCATE, ops, page, iomap_page_mkwrite_actor);
> +	if (unlikely(ret < 0))
> +		goto out_unlock;
> +	set_page_dirty(page);
> +	wait_for_stable_page(page);
> +	return 0;
> +out_unlock:
> +	unlock_page(page);
> +	return ret;
> +}

Because we don't handle short segment writes here,
iomap_page_mkwrite() fails to allocate blocks on partial pages when
block size < page size. This can be seen by generic/030 on XFS with
a 1k block size.

Patch below fixes the issue, as well as the fact that
iomap_page_mkwrite_actor() needs to return the count of bytes
"written", not zero on success for iomap_write_segment() to do the
right thing on multi-segment writes.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

iomap: fix page_mkwrite on bs < ps

Fixes generic/030.

Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
---
 fs/iomap.c | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index d4528cb..c4d3511 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -337,10 +337,16 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, ssize_t length,
 	int ret;
 
 	ret = __block_write_begin_int(page, 0, length, NULL, iomap);
-	if (!ret)
-		ret = block_commit_write(page, 0, length);
+	if (ret)
+		return ret;
+
+	/*
+	 * block_commit_write always returns 0, we need to return the length we
+	 * successfully allocated.
+	 */
+	block_commit_write(page, 0, length);
+	return length;
 
-	return ret;
 }
 
 int iomap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
@@ -350,7 +356,8 @@ int iomap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
 	struct inode *inode = file_inode(vma->vm_file);
 	unsigned long length;
 	loff_t size;
-	int ret;
+	loff_t offset;
+	ssize_t ret;
 
 	lock_page(page);
 	size = i_size_read(inode);
@@ -367,10 +374,17 @@ int iomap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
 	else
 		length = PAGE_CACHE_SIZE;
 
-	ret = iomap_write_segment(inode, page_offset(page), length,
-			IOMAP_ALLOCATE, ops, page, iomap_page_mkwrite_actor);
-	if (unlikely(ret < 0))
-		goto out_unlock;
+	offset = page_offset(page);
+	while (length > 0) {
+		ret = iomap_write_segment(inode, offset, length,
+				IOMAP_ALLOCATE, ops, page,
+				iomap_page_mkwrite_actor);
+		if (unlikely(ret < 0))
+			goto out_unlock;
+		offset += ret;
+		length -= ret;
+	}
+
 	set_page_dirty(page);
 	wait_for_stable_page(page);
 	return 0;

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs