Re: folio_map

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 16, 2022 at 07:08:22PM +0100, Matthew Wilcox wrote:
> Some of you will already know all this, but I'll go into a certain amount
> of detail for the peanut gallery.
> 
> One of the problems that people want to solve with multi-page folios
> is supporting filesystem block sizes > PAGE_SIZE.  Such filesystems
> already exist; you can happily create a 64kB block size filesystem on
> a PPC/ARM/... today, then fail to mount it on an x86 machine.

The XFS buffer cache already supports 64kB block sizes on 4kB page
size machines - we do this with bulk page allocation and
vm_map_ram()/vm_unmap_ram() of the page arrays that are built.

These mappings are persistent (i.e. cannot be local), but if you
want to prototype something before the page cache has been
completely modified to support BS > PS, then the XFS buffer
cache already does what you need. Just make XFS filesystems with
"-n size=64k" to use directory block sizes of 64kB and do lots of
work with directory operations on large directories.

> kmap_local_folio() only lets you map a single page from a folio.
> This works for the majority of cases (eg ->write_begin() works on a
> per-page basis *anyway*, so we can just map a single page from the folio).
> But this is somewhat hampering for ext2_get_page(), used for directory
> handling.  A directory record may cross a page boundary (because it
> wasn't a page boundary on the machine which created the filesystem),
> and juggling two pages being mapped at once is tricky with the stack
> model for kmap_local.

Yup, that's exactly the problem we avoid by using mapped buffers in
XFS.

> I don't particularly want to invest heavily in optimising for HIGHMEM.
> The number of machines which will use multi-page folios and HIGHMEM is
> not going to be large, one hopes, as 64-bit kernels are far more common.
> I'm happy for 32-bit to be slow, as long as it works.

Fully agree.

> For these reasons, I proposing the logical equivalent to this:
> 
> +void *folio_map_local(struct folio *folio)
> +{
> +       if (!IS_ENABLED(CONFIG_HIGHMEM))
> +               return folio_address(folio);
> +       if (!folio_test_large(folio))
> +               return kmap_local_page(&folio->page);
> +       return vmap_folio(folio);
> +}
> +
> +void folio_unmap_local(const void *addr)
> +{
> +       if (!IS_ENABLED(CONFIG_HIGHMEM))
> +               return;
> +       if (is_vmalloc_addr(addr))
> +               vunmap(addr);
> +	else
> +       	kunmap_local(addr);
> +}
> 
> (where vmap_folio() is a new function that works a lot like vmap(),
> chunks of this get moved out-of-line, etc, etc., but this concept)

*nod*

> Does anyone have any better ideas?  If it'd be easy to map N pages
> locally, for example ... looks like we only support up to 16 pages
> mapped per CPU at any time, so mapping all of a 64kB folio would
> almost always fail, and even mapping a 32kB folio would be unlikely
> to succeed.

FWIW, what I really want for the XFS buffer cache is a large folio
aware variant of vm_map_ram/vm_unmap_ram(). i.e. something we can
pass a random assortment of folios into, and it just does the right
thing to create a persistent contiguous mapping of the folios.

i.e. we have an allocation loop that tries to allocate large folios,
but then falls back to smaller folios if the large allocation cannot
be fulfilled without blocking. Then the mapping function works with
whatever we managed to allocate in the most optimal way....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux