On 12/12/19 3:34 PM, Dave Chinner wrote: > On Wed, Dec 11, 2019 at 08:29:43AM -0700, Jens Axboe wrote: >> This adds support for RWF_UNCACHED for file systems using iomap to >> perform buffered writes. We use the generic infrastructure for this, >> by tracking pages we created and calling write_drop_cached_pages() >> to issue writeback and prune those pages. >> >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >> --- >> fs/iomap/apply.c | 24 ++++++++++++++++++++++++ >> fs/iomap/buffered-io.c | 37 +++++++++++++++++++++++++++++-------- >> include/linux/iomap.h | 5 +++++ >> 3 files changed, 58 insertions(+), 8 deletions(-) >> >> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c >> index 562536da8a13..966826ad4bb9 100644 >> --- a/fs/iomap/apply.c >> +++ b/fs/iomap/apply.c >> @@ -90,5 +90,29 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, >> flags, &iomap); >> } >> >> + if (written && (flags & IOMAP_UNCACHED)) { >> + struct address_space *mapping = inode->i_mapping; >> + >> + end = pos + written; >> + ret = filemap_write_and_wait_range(mapping, pos, end); >> + if (ret) >> + goto out; >> + >> + /* >> + * No pages were created for this range, we're done >> + */ >> + if (!(iomap.flags & IOMAP_F_PAGE_CREATE)) >> + goto out; >> + >> + /* >> + * Try to invalidate cache pages for the range we just wrote. >> + * We don't care if invalidation fails as the write has still >> + * worked and leaving clean uptodate pages in the page cache >> + * isn't a corruption vector for uncached IO. >> + */ >> + invalidate_inode_pages2_range(mapping, >> + pos >> PAGE_SHIFT, end >> PAGE_SHIFT); >> + } >> +out: >> return written ? written : ret; >> } > > Just a thought on further optimisation for this for XFS. > IOMAP_UNCACHED is being passed into the filesystem ->iomap_begin > methods by iomap_apply(). Hence the filesystems know that it is > an uncached IO that is being done, and we can tailor allocation > strategies to suit the fact that the data is going to be written > immediately. > > In this case, XFS needs to treat it the same way it treats direct > IO. That is, we do immediate unwritten extent allocation rather than > delayed allocation. This will reduce the allocation overhead and > will optimise for immediate IO locality rather than optimise for > delayed allocation. > > This should just be a relatively simple change to > xfs_file_iomap_begin() along the lines of: > > - if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && !(flags & IOMAP_DIRECT) && > - !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) { > + if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && > + !(flags & (IOMAP_DIRECT | IOMAP_UNCACHED)) && > + !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) { > /* Reserve delalloc blocks for regular writeback. */ > return xfs_file_iomap_begin_delay(inode, offset, length, flags, > iomap); > } > > so that it avoids delayed allocation for uncached IO... That's very handy! Thanks, I'll add that to the next version. Just out of curiosity, would you prefer this as a separate patch, or just bundle it with the iomap buffered RWF_UNCACHED patch? I'm assuming the latter, and I'll just mention it in the changelog. -- Jens Axboe