On 12/15/19 9:17 PM, Dave Chinner wrote: > On Thu, Dec 12, 2019 at 05:57:57PM -0700, Jens Axboe wrote: >> On 12/12/19 5:54 PM, Jens Axboe wrote: >>> On 12/12/19 3:34 PM, Dave Chinner wrote: >>>> On Wed, Dec 11, 2019 at 08:29:43AM -0700, Jens Axboe wrote: >>>>> This adds support for RWF_UNCACHED for file systems using iomap to >>>>> perform buffered writes. We use the generic infrastructure for this, >>>>> by tracking pages we created and calling write_drop_cached_pages() >>>>> to issue writeback and prune those pages. >>>>> >>>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >>>>> --- >>>>> fs/iomap/apply.c | 24 ++++++++++++++++++++++++ >>>>> fs/iomap/buffered-io.c | 37 +++++++++++++++++++++++++++++-------- >>>>> include/linux/iomap.h | 5 +++++ >>>>> 3 files changed, 58 insertions(+), 8 deletions(-) >>>>> >>>>> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c >>>>> index 562536da8a13..966826ad4bb9 100644 >>>>> --- a/fs/iomap/apply.c >>>>> +++ b/fs/iomap/apply.c >>>>> @@ -90,5 +90,29 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, >>>>> flags, &iomap); >>>>> } >>>>> >>>>> + if (written && (flags & IOMAP_UNCACHED)) { >>>>> + struct address_space *mapping = inode->i_mapping; >>>>> + >>>>> + end = pos + written; >>>>> + ret = filemap_write_and_wait_range(mapping, pos, end); >>>>> + if (ret) >>>>> + goto out; >>>>> + >>>>> + /* >>>>> + * No pages were created for this range, we're done >>>>> + */ >>>>> + if (!(iomap.flags & IOMAP_F_PAGE_CREATE)) >>>>> + goto out; >>>>> + >>>>> + /* >>>>> + * Try to invalidate cache pages for the range we just wrote. >>>>> + * We don't care if invalidation fails as the write has still >>>>> + * worked and leaving clean uptodate pages in the page cache >>>>> + * isn't a corruption vector for uncached IO. >>>>> + */ >>>>> + invalidate_inode_pages2_range(mapping, >>>>> + pos >> PAGE_SHIFT, end >> PAGE_SHIFT); >>>>> + } >>>>> +out: >>>>> return written ? written : ret; >>>>> } >>>> >>>> Just a thought on further optimisation for this for XFS. >>>> IOMAP_UNCACHED is being passed into the filesystem ->iomap_begin >>>> methods by iomap_apply(). Hence the filesystems know that it is >>>> an uncached IO that is being done, and we can tailor allocation >>>> strategies to suit the fact that the data is going to be written >>>> immediately. >>>> >>>> In this case, XFS needs to treat it the same way it treats direct >>>> IO. That is, we do immediate unwritten extent allocation rather than >>>> delayed allocation. This will reduce the allocation overhead and >>>> will optimise for immediate IO locality rather than optimise for >>>> delayed allocation. >>>> >>>> This should just be a relatively simple change to >>>> xfs_file_iomap_begin() along the lines of: >>>> >>>> - if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && !(flags & IOMAP_DIRECT) && >>>> - !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) { >>>> + if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && >>>> + !(flags & (IOMAP_DIRECT | IOMAP_UNCACHED)) && >>>> + !IS_DAX(inode) && !xfs_get_extsz_hint(ip)) { >>>> /* Reserve delalloc blocks for regular writeback. */ >>>> return xfs_file_iomap_begin_delay(inode, offset, length, flags, >>>> iomap); >>>> } >>>> >>>> so that it avoids delayed allocation for uncached IO... >>> >>> That's very handy! Thanks, I'll add that to the next version. Just out >>> of curiosity, would you prefer this as a separate patch, or just bundle >>> it with the iomap buffered RWF_UNCACHED patch? I'm assuming the latter, >>> and I'll just mention it in the changelog. >> >> OK, since it's in XFS, it'd be a separate patch. > > *nod* > >> The code you quote seems >> to be something out-of-tree? > > Ah, I quoted the code in the 5.4 release branch, not the 5.5-rc1 > tree. I'd forgotten that the xfs_file_iomap_begin() got massively > refactored in the 5.5 merge and I hadn't updated my cscope trees. SO > I'm guessing you want to go looking for the > xfs_buffered_write_iomap_begin() and add another case to this > initial branch: > > /* we can't use delayed allocations when using extent size hints */ > if (xfs_get_extsz_hint(ip)) > return xfs_direct_write_iomap_begin(inode, offset, count, > flags, iomap, srcmap); > > To make the buffered write IO go down the direct IO allocation path... Makes it even simpler! Something like this: commit 1783722cd4b7088a3c004462c7ae610b8e42b720 Author: Jens Axboe <axboe@xxxxxxxxx> Date: Tue Dec 17 07:30:04 2019 -0700 xfs: don't do delayed allocations for uncached buffered writes This data is going to be written immediately, so don't bother trying to do delayed allocation for it. Suggested-by: Dave Chinner <david@xxxxxxxxxxxxx> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 28e2d1f37267..d0cd4a05d59f 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -847,8 +847,11 @@ xfs_buffered_write_iomap_begin( int allocfork = XFS_DATA_FORK; int error = 0; - /* we can't use delayed allocations when using extent size hints */ - if (xfs_get_extsz_hint(ip)) + /* + * Don't do delayed allocations when using extent size hints, or + * if we were asked to do uncached buffered writes. + */ + if (xfs_get_extsz_hint(ip) || (flags & IOMAP_UNCACHED)) return xfs_direct_write_iomap_begin(inode, offset, count, flags, iomap, srcmap); -- Jens Axboe