On Tue, Jan 31, 2023 at 8:37 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > On Sun, Jan 08, 2023 at 08:40:29PM +0100, Andreas Gruenbacher wrote: > > +static struct folio * > > +gfs2_iomap_page_prepare(struct iomap_iter *iter, loff_t pos, unsigned len) > > { > > + struct inode *inode = iter->inode; > > unsigned int blockmask = i_blocksize(inode) - 1; > > struct gfs2_sbd *sdp = GFS2_SB(inode); > > unsigned int blocks; > > + struct folio *folio; > > + int status; > > > > blocks = ((pos & blockmask) + len + blockmask) >> inode->i_blkbits; > > - return gfs2_trans_begin(sdp, RES_DINODE + blocks, 0); > > + status = gfs2_trans_begin(sdp, RES_DINODE + blocks, 0); > > + if (status) > > + return ERR_PTR(status); > > + > > + folio = iomap_get_folio(iter, pos); > > + if (IS_ERR(folio)) > > + gfs2_trans_end(sdp); > > + return folio; > > } > > Hi Andreas, Hello, > I didn't think to mention this at the time, but I was reading through > buffered-io.c and this jumped out at me. For filesystems which support > folios, we pass the entire length of the write (or at least the length > of the remaining iomap length). That's intended to allow us to decide > how large a folio to allocate at some point in the future. > > For GFS2, we do this: > > if (!mapping_large_folio_support(iter->inode->i_mapping)) > len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos)); > > I'd like to drop that and pass the full length of the write to > ->get_folio(). It looks like you'll have to clamp it yourself at this > point. sounds reasonable to me. I see that gfs2_page_add_databufs() hasn't been folio-ized yet, but it looks like it might just work anway. So gfs2_iomap_get_folio() ... gfs2_iomap_put_folio() should, in principle, work for requests bigger than PAGE_SIZE. Is there a reasonable way of trying it out? We still want to keep the transaction size somewhat reasonable, but the maximum size gfs2_iomap_begin() will return for a write is 509 blocks on a 4k-block filesystem, or slightly less than 2 MiB, which should be fine. > I am kind of curious why you do one transaction per page -- > I would have thought you'd rather do one transaction for the entire write. Only for journaled data writes. We could probably do bigger transactions even in that case, but we'd rather get rid of data journaling than encourage it, so we're also not spending a lot of time on optimizing this case. Thanks, Andreas