On Wed, Dec 11, 2019 at 08:29:43AM -0700, Jens Axboe wrote: > This adds support for RWF_UNCACHED for file systems using iomap to > perform buffered writes. We use the generic infrastructure for this, > by tracking pages we created and calling write_drop_cached_pages() > to issue writeback and prune those pages. > > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> > --- > fs/iomap/apply.c | 24 ++++++++++++++++++++++++ > fs/iomap/buffered-io.c | 37 +++++++++++++++++++++++++++++-------- > include/linux/iomap.h | 5 +++++ > 3 files changed, 58 insertions(+), 8 deletions(-) > > diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c > index 562536da8a13..966826ad4bb9 100644 > --- a/fs/iomap/apply.c > +++ b/fs/iomap/apply.c > @@ -90,5 +90,29 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, > flags, &iomap); > } > > + if (written && (flags & IOMAP_UNCACHED)) { > + struct address_space *mapping = inode->i_mapping; > + > + end = pos + written; > + ret = filemap_write_and_wait_range(mapping, pos, end); > + if (ret) > + goto out; > + > + /* > + * No pages were created for this range, we're done > + */ > + if (!(iomap.flags & IOMAP_F_PAGE_CREATE)) > + goto out; > + > + /* > + * Try to invalidate cache pages for the range we just wrote. > + * We don't care if invalidation fails as the write has still > + * worked and leaving clean uptodate pages in the page cache > + * isn't a corruption vector for uncached IO. > + */ > + invalidate_inode_pages2_range(mapping, > + pos >> PAGE_SHIFT, end >> PAGE_SHIFT); > + } > +out: > return written ? written : ret; > } Just a thought on further optimisation for this for XFS. IOMAP_UNCACHED is being passed into the filesystem ->iomap_begin methods by iomap_apply(). Hence the filesystems know that it is an uncached IO that is being done, and we can tailor allocation strategies to suit the fact that the data is going to be written immediately. In this case, XFS needs to treat it the same way it treats direct IO. That is, we do immediate unwritten extent allocation rather than delayed allocation. This will reduce the allocation overhead and will optimise for immediate IO locality rather than optimise for delayed allocation. This should just be a relatively simple change to xfs_file_iomap_begin() along the lines of: - if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && !(flags & IOMAP_DIRECT) && - !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) { + if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && + !(flags & (IOMAP_DIRECT | IOMAP_UNCACHED)) && + !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) { /* Reserve delalloc blocks for regular writeback. */ return xfs_file_iomap_begin_delay(inode, offset, length, flags, iomap); } so that it avoids delayed allocation for uncached IO... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx