On Mon, Mar 14, 2016 at 10:02:45PM +0100, Christoph Hellwig wrote: > Add infrastructure for multipage buffered writes. This is implemented > using an main iterator that applies an actor function to a range that > can be written. > > This infrastucture is used to implement a buffered write helper, one > to zero file ranges and one to implement the ->page_mkwrite VM > operations. All of them borrow a fair amount of code from fs/buffers. > for now by using an internal version of __block_write_begin that > gets passed an iomap and builds the corresponding buffer head. > > The file system is gets a set of paired ->iomap_begin and ->iomap_end > calls which allow it to map/reserve a range and get a notification > once the write code is finished with it. > > Based on earlier code from Dave Chinner. > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> ..... > +/* > + * Execute a iomap write on a segment of the mapping that spans a > + * contiguous range of pages that have identical block mapping state. > + * > + * This avoids the need to map pages individually, do individual allocations > + * for each page and most importantly avoid the need for filesystem specific > + * locking per page. Instead, all the operations are amortised over the entire > + * range of pages. It is assumed that the filesystems will lock whatever > + * resources they require in the iomap_begin call, and release them in the > + * iomap_end call. > + */ > +static ssize_t > +iomap_write_segment(struct inode *inode, loff_t pos, ssize_t length, > + unsigned flags, struct iomap_ops *ops, void *data, > + write_actor_t actor) This requires external iteration to write the entire range required if the allocation does not cover the entire length requested (i.e. written < length). Also, if the actor returns an error into written, that gets ignored and the return status is whatever the ->iomap_end call returns. .... > +int iomap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, > + struct iomap_ops *ops) > +{ > + struct page *page = vmf->page; > + struct inode *inode = file_inode(vma->vm_file); > + unsigned long length; > + loff_t size; > + int ret; > + > + lock_page(page); > + size = i_size_read(inode); > + if ((page->mapping != inode->i_mapping) || > + (page_offset(page) > size)) { > + /* We overload EFAULT to mean page got truncated */ > + ret = -EFAULT; > + goto out_unlock; > + } > + > + /* page is wholly or partially inside EOF */ > + if (((page->index + 1) << PAGE_CACHE_SHIFT) > size) > + length = size & ~PAGE_CACHE_MASK; > + else > + length = PAGE_CACHE_SIZE; > + > + ret = iomap_write_segment(inode, page_offset(page), length, > + IOMAP_ALLOCATE, ops, page, iomap_page_mkwrite_actor); > + if (unlikely(ret < 0)) > + goto out_unlock; > + set_page_dirty(page); > + wait_for_stable_page(page); > + return 0; > +out_unlock: > + unlock_page(page); > + return ret; > +} Because we don't handle short segment writes here, iomap_page_mkwrite() fails to allocate blocks on partial pages when block size < page size. This can be seen by generic/030 on XFS with a 1k block size. Patch below fixes the issue, as well as the fact that iomap_page_mkwrite_actor() needs to return the count of bytes "written", not zero on success for iomap_write_segment() to do the right thing on multi-segment writes. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx iomap: fix page_mkwrite on bs < ps Fixes generic/030. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> --- fs/iomap.c | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/fs/iomap.c b/fs/iomap.c index d4528cb..c4d3511 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -337,10 +337,16 @@ iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, ssize_t length, int ret; ret = __block_write_begin_int(page, 0, length, NULL, iomap); - if (!ret) - ret = block_commit_write(page, 0, length); + if (ret) + return ret; + + /* + * block_commit_write always returns 0, we need to return the length we + * successfully allocated. + */ + block_commit_write(page, 0, length); + return length; - return ret; } int iomap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, @@ -350,7 +356,8 @@ int iomap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, struct inode *inode = file_inode(vma->vm_file); unsigned long length; loff_t size; - int ret; + loff_t offset; + ssize_t ret; lock_page(page); size = i_size_read(inode); @@ -367,10 +374,17 @@ int iomap_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf, else length = PAGE_CACHE_SIZE; - ret = iomap_write_segment(inode, page_offset(page), length, - IOMAP_ALLOCATE, ops, page, iomap_page_mkwrite_actor); - if (unlikely(ret < 0)) - goto out_unlock; + offset = page_offset(page); + while (length > 0) { + ret = iomap_write_segment(inode, offset, length, + IOMAP_ALLOCATE, ops, page, + iomap_page_mkwrite_actor); + if (unlikely(ret < 0)) + goto out_unlock; + offset += ret; + length -= ret; + } + set_page_dirty(page); wait_for_stable_page(page); return 0; _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs