On Tue, Mar 07, 2023 at 10:11:43PM -0800, Luis Chamberlain wrote: > On Sun, Mar 05, 2023 at 05:02:43AM +0000, Matthew Wilcox wrote: > > On Sat, Mar 04, 2023 at 08:15:50PM -0800, Luis Chamberlain wrote: > > > On Sat, Mar 04, 2023 at 04:39:02PM +0000, Matthew Wilcox wrote: > > > > XFS already works with arbitrary-order folios. > > > > > > But block sizes > PAGE_SIZE is work which is still not merged. It > > > *can* be with time. That would allow one to muck with larger block > > > sizes than 4k on x86-64 for instance. Without this, you can't play > > > ball. > > > > Do you mean that XFS is checking that fs block size <= PAGE_SIZE and > > that check needs to be dropped? If so, I don't see where that happens. > > None of that. Back in 2018 Chinner had prototyped XFS support with > larger block size > PAGE_SIZE: > > https://lwn.net/ml/linux-fsdevel/20181107063127.3902-1-david@xxxxxxxxxxxxx/ Having a working BS > PS implementation on XFS based on variable page order support in the page cache goes back over a decade before that. Christoph Lameter did the page cache work, and I added support for XFS back in 2007. THe total change to XFS required can be seen in this simple patch: https://lore.kernel.org/linux-mm/20070423093152.GI32602149@xxxxxxxxxxxxxxxxx/ That was when the howls of anguish about high order allocations Willy mentioned started.... > I just did a quick attempt to rebased it and most of the left over work > is actually on IOMAP for writeback and zero / writes requiring a new > zero-around functionality. All bugs on the rebase are my own, only compile > tested so far, and not happy with some of the changes I had to make so > likely could use tons more love: > > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20230307-larger-bs-then-ps-xfs On a current kernel, that patchset is fundamentally broken as we have multi-page folio support in XFS and iomap - the patchset is inherently PAGE_SIZE based and it will do the the wrong thing with PAGE_SIZE based zero-around. IOWs, IOMAP_F_ZERO_AROUND does not need to exist any more, nor should any of the custom hooks it triggered in different operations for zero-around. That's because we should now be using the same approach to BS > PS as we first used back in 2007. We already support multi-page folios in the page cache, so all the zero-around and partial folio uptodate tracking we need is already in place. Hence, like Willy said, all we need to do is have filemap_get_folio(FGP_CREAT) always allocate at least filesystem block sized and aligned folio and insert them into the mapping tree. Multi-page folios will always need to be sized as an integer multiple of the filesystem block size, but once we ensure size and alignment of folios in the page cache, we get everything else for free. /me cues the howls of anguish over memory fragmentation.... > But it should give you an idea of what type of things filesystems need to do. Not really. it gives you an idea of what filesystems needed to do 5 years ago to support BS > PS. We're living in the age of folios now, not pages. Willy starting work on folios was why I dropped that patch set, firstly because it was going to make the iomap conversion to folios harder, and secondly, we realised that none of it was necessary if folios supported multi-page constructs in the page cache natively. IOWs, multi-page folios in the page cache should make BS > PS mostly trivial to support for any filesystem or block device that doesn't have some other dependency on PAGE_SIZE objects in the page cache (e.g. bufferheads). Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx