On Thu, Aug 29, 2024 at 01:50:59PM +1000, Dave Chinner wrote: > On Wed, Aug 28, 2024 at 05:13:56PM -0400, Josef Bacik wrote: > > We're currently using the old ->write_begin()/->write_end() method of > > doing buffered writes. This isn't a huge deal for fuse since we > > basically just want to copy the pages and move on, but the iomap > > infrastructure gives us access to having huge folios. Rework the > > buffered write path when we have writeback cache to use the iomap > > buffered write code, the ->get_folio() callback now handles the work > > that we did in ->write_begin(), the rest of the work is handled inside > > of iomap so we don't need a replacement for ->write_end. > > > > This does bring BLOCK as a dependency, as the buffered write part of > > iomap requires CONFIG_BLOCK. This could be shed if we reworked the file > > write iter portion of the buffered write path was separated out to not > > need BLOCK. > > > > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > > --- > > fs/fuse/Kconfig | 2 + > > fs/fuse/file.c | 154 +++++++++++++++++++++--------------------------- > > 2 files changed, 68 insertions(+), 88 deletions(-) > > > > diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig > > index 8674dbfbe59d..8a799324d7bd 100644 > > --- a/fs/fuse/Kconfig > > +++ b/fs/fuse/Kconfig > > @@ -1,7 +1,9 @@ > > # SPDX-License-Identifier: GPL-2.0-only > > config FUSE_FS > > tristate "FUSE (Filesystem in Userspace) support" > > + depends on BLOCK > > select FS_POSIX_ACL > > + select FS_IOMAP > > help > > With FUSE it is possible to implement a fully functional filesystem > > in a userspace program. > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c > > index ab531a4694b3..af91043b44d7 100644 > > --- a/fs/fuse/file.c > > +++ b/fs/fuse/file.c > > @@ -21,6 +21,7 @@ > > #include <linux/filelock.h> > > #include <linux/splice.h> > > #include <linux/task_io_accounting_ops.h> > > +#include <linux/iomap.h> > > > > static int fuse_send_open(struct fuse_mount *fm, u64 nodeid, > > unsigned int open_flags, int opcode, > > @@ -1420,6 +1421,63 @@ static void fuse_dio_unlock(struct kiocb *iocb, bool exclusive) > > } > > } > > > > +static struct folio *fuse_iomap_get_folio(struct iomap_iter *iter, > > + loff_t pos, unsigned len) > > +{ > > + struct file *file = (struct file *)iter->private; > > + struct inode *inode = iter->inode; > > + struct folio *folio; > > + loff_t fsize; > > + > > + folio = iomap_get_folio(iter, pos, len); > > + if (IS_ERR(folio)) > > + return folio; > > + > > + fuse_wait_on_folio_writeback(inode, folio); > > + > > + if (folio_test_uptodate(folio)) > > + return folio; > > + > > + /* > > + * If we're going to write past EOF then avoid the read, but zero the > > + * whole thing and mark it uptodate so that if we get a short write we > > + * don't try to re-read this page, we just carry on. > > + */ > > + fsize = i_size_read(inode); > > + if (fsize <= folio_pos(folio)) { > > + folio_zero_range(folio, 0, folio_size(folio)); > > The comment doesn't match what this does - the folio is not marked > uptodate at all. I'll update the comment, it gets marked uptodate in __iomap_write_end() once the write is complete. > > > + } else { > > + int err = fuse_do_readpage(file, &folio->page); > > readpage on a large folio? does that work? I haven't done the work to enable large folios yet, this is just the prep stuff. Supporting large folios is going to take a fair bit of work, so I'm getting the ball rolling with this prep series. > > > + if (err) { > > + folio_unlock(folio); > > + folio_put(folio); > > + return ERR_PTR(err); > > + } > > + } > > Also, why do this here when __iomap_write_begin() will do all the > sub-folio zeroing and read IO on the folio? > I looked long and hard at iomap because I thought it would, but it turns out it won't work for fuse. I could be totally wrong, but looking at iomap it will allocate an ifs because it assumes this is sub-folio blocksize, but we aren't, and don't really want to incur that pain. Additionally it does iomap_read_folio_sync() to read in the folio, which just does a bio, which obviously doesn't work on fuse. Again totally expecting to be told I'm stupid in some way that I missed, but it seemed like iomap won't do what we need it to do here, and it's simple enough to handle the zeroing here for ourselves. > > + > > + return folio; > > +} > > + > > +static const struct iomap_folio_ops fuse_iomap_folio_ops = { > > + .get_folio = fuse_iomap_get_folio, > > +}; > > + > > +static int fuse_iomap_begin_write(struct inode *inode, loff_t pos, loff_t length, > > + unsigned flags, struct iomap *iomap, > > + struct iomap *srcmap) > > +{ > > + iomap->type = IOMAP_DELALLOC; > > + iomap->addr = IOMAP_NULL_ADDR; > > + iomap->offset = pos; > > + iomap->length = length; > > + iomap->folio_ops = &fuse_iomap_folio_ops; > > + return 0; > > +} > > What's the reason for using IOMAP_DELALLOC for these mappings? I'm > not saying it is wrong, I just don't know enough about fuse to > understand is this is valid or not because there are no iomap-based > writeback hooks being added here.... At the time it was "oh we're doing what equates to delalloc, clearly this should be marked as delalloc". Now that I have been in this code longer I realize this is supposed to be "what does this range look like now", so I suppose the "right" thing to do here is use IOMAP_HOLE? Thanks, Josef