On 5/26/21 11:07 PM, Keith Busch wrote: > On Fri, May 14, 2021 at 06:48:26PM +0100, Matthew Wilcox wrote: >> On Mon, May 10, 2021 at 06:56:17PM +0100, Matthew Wilcox wrote: >>> I don't know exactly how much will be left to discuss about supporting >>> larger memory allocation units in the page cache by December. In my >>> ideal world, all the patches I've submitted so far are accepted, I >>> persuade every filesystem maintainer to convert their own filesystem >>> and struct page is nothing but a bad memory by December. In reality, >>> I'm just not that persuasive. >>> >>> So, probably some kind of discussion will be worthwhile about >>> converting the remaining filesystems to use folios, when it's worth >>> having filesystems opt-in to multi-page folios, what we can do about >>> buffer-head based filesystems, and so on. >>> >>> Hopefully we aren't still discussing whether folios are a good idea >>> or not by then. >> >> I got an email from Hannes today asking about memory folios as they >> pertain to the block layer, and I thought this would be a good chance >> to talk about them. If you're not familiar with the term "folio", >> https://lore.kernel.org/lkml/20210505150628.111735-10-willy@xxxxxxxxxxxxx/ >> is not a bad introduction. >> >> Thanks to the work done by Ming Lei in 2017, the block layer already >> supports multipage bvecs, so to a first order of approximation, I don't >> need anything from the block layer on down through the various storage >> layers. Which is why I haven't been talking to anyone in storage! >> >> It might change (slightly) the contents of bios. For example, >> bvec[n]->bv_offset might now be larger than PAGE_SIZE. Drivers should >> handle this OK, but probably haven't been audited to make sure they do. >> Mostly, it's simply that drivers will now see fewer, larger, segments >> in their bios. Once a filesystem supports multipage folios, we will >> allocate order-N pages as part of readahead (and sufficiently large >> writes). Dirtiness is tracked on a per-folio basis (not per page), >> so folios take trips around the LRU as a single unit and finally make >> it to being written back as a single unit. >> >> Drivers still need to cope with sub-folio-sized reads and writes. >> O_DIRECT still exists and (eg) doing a sub-page, block-aligned write >> will not necessarily cause readaround to happen. Filesystems may read >> and write their own metadata at whatever granularity and alignment they >> see fit. But the vast majority of pagecache I/O will be folio-sized >> and folio-aligned. >> >> I do have two small patches which make it easier for the one >> filesystem that I've converted so far (iomap/xfs) to add folios to bios >> and get folios back out of bios: >> >> https://lore.kernel.org/lkml/20210505150628.111735-72-willy@xxxxxxxxxxxxx/ >> https://lore.kernel.org/lkml/20210505150628.111735-73-willy@xxxxxxxxxxxxx/ >> >> as well as a third patch that estimates how large a bio to allocate, >> given the current folio that it's working on: >> https://git.infradead.org/users/willy/pagecache.git/commitdiff/89541b126a59dc7319ad618767e2d880fcadd6c2 >> >> It would be possible to make other changes in future. For example, if >> we decide it'd be better, we could change bvecs from being (page, offset, >> length) to (folio, offset, length). I don't know that it's worth doing; >> it would need to be evaluated on its merits. Personally, I'd rather >> see us move to a (phys_addr, length) pair, but I'm a little busy at the >> moment. >> >> Hannes has some fun ideas about using the folio work to support larger >> sector sizes, and I think they're doable. > > I'm also interested in this, and was looking into the exact same thing > recently. Some of the very high capacity SSDs that can really benefit > from better large sector support. If this is a topic for the conference, > I would like to attend this session. > And, of course, so would I :-) Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@xxxxxxx +49 911 74053 688 SUSE Software Solutions Germany GmbH, 90409 Nürnberg GF: F. Imendörffer, HRB 36809 (AG Nürnberg)