On 8/16/22 11:08, Matthew Wilcox wrote:
Some of you will already know all this, but I'll go into a certain amount of detail for the peanut gallery. One of the problems that people want to solve with multi-page folios is supporting filesystem block sizes > PAGE_SIZE. Such filesystems already exist; you can happily create a 64kB block size filesystem on a PPC/ARM/... today, then fail to mount it on an x86 machine. kmap_local_folio() only lets you map a single page from a folio. This works for the majority of cases (eg ->write_begin() works on a per-page basis *anyway*, so we can just map a single page from the folio). But this is somewhat hampering for ext2_get_page(), used for directory handling. A directory record may cross a page boundary (because it wasn't a page boundary on the machine which created the filesystem), and juggling two pages being mapped at once is tricky with the stack model for kmap_local. I don't particularly want to invest heavily in optimising for HIGHMEM. The number of machines which will use multi-page folios and HIGHMEM is not going to be large, one hopes, as 64-bit kernels are far more common. I'm happy for 32-bit to be slow, as long as it works.
Some of our kernel driver teams recently expressed precisely the same set of requirements. And at first, I pointed them to folio_map_local(), and then they schooled me by noting that, today, it only does a single page. :)
For these reasons, I proposing the logical equivalent to this: +void *folio_map_local(struct folio *folio) +{ + if (!IS_ENABLED(CONFIG_HIGHMEM)) + return folio_address(folio); + if (!folio_test_large(folio)) + return kmap_local_page(&folio->page); + return vmap_folio(folio); +}
...which led to a desire for code very much like the above: kmap(), with a fallback to vmap(). Always better to have such things in the kernel, rather than a zillion copies in drivers. Adding Mark Hairgrove in case I've missed any fine points?
+ +void folio_unmap_local(const void *addr) +{ + if (!IS_ENABLED(CONFIG_HIGHMEM)) + return; + if (is_vmalloc_addr(addr)) + vunmap(addr); + else + kunmap_local(addr); +} (where vmap_folio() is a new function that works a lot like vmap(), chunks of this get moved out-of-line, etc, etc., but this concept) Does anyone have any better ideas? If it'd be easy to map N pages locally, for example ... looks like we only support up to 16 pages mapped per CPU at any time, so mapping all of a 64kB folio would almost always fail, and even mapping a 32kB folio would be unlikely to succeed.
thanks, -- John Hubbard NVIDIA