On Fri, Mar 03, 2023 at 08:11:47AM -0500, James Bottomley wrote: > On Fri, 2023-03-03 at 03:49 +0000, Matthew Wilcox wrote: > > On Thu, Mar 02, 2023 at 06:58:58PM -0700, Keith Busch wrote: > > > That said, I was hoping you were going to suggest supporting 16k > > > logical block sizes. Not a problem on some arch's, but still > > > problematic when PAGE_SIZE is 4k. :) > > > > I was hoping Luis was going to propose a session on LBA size > > > PAGE_SIZE. Funnily, while the pressure is coming from the storage > > vendors, I don't think there's any work to be done in the storage > > layers. It's purely a FS+MM problem. > > Heh, I can do the fools rush in bit, especially if what we're > interested in the minimum it would take to support this ... > > The FS problem could be solved simply by saying FS block size must > equal device block size, then it becomes purely a MM issue. Spoken like somebody who's never converted a filesystem to supporting large folios. There are a number of issues: 1. The obvious; use of PAGE_SIZE and/or PAGE_SHIFT 2. Use of kmap-family to access, eg directories. You can't kmap an entire folio, only one page at a time. And if a dentry is split across a page boundary ... 3. buffer_heads do not currently support large folios. Working on it. Probably a few other things I forget. But look through the recent patches to AFS, CIFS, NFS, XFS, iomap that do folio conversions. A lot of it is pretty mechanical, but some of it takes hard thought. And if you have ideas about how to handle ext2 directories, I'm all ears. > The MM > issue could be solved by adding a page order attribute to struct > address_space and insisting that pagecache/filemap functions in > mm/filemap.c all have to operate on objects that are an integer > multiple of the address space order. The base allocator is > filemap_alloc_folio, which already has an apparently always zero order > parameter (hmmm...) and it always seems to be called from sites that > have the address_space, so it could simply be modified to always > operate at the address_space order. Oh, I have a patch for that. That's the easy part. The hard part is plugging your ears to the screams of the MM people who are convinced that fragmentation will make it impossible to mount your filesystem.