[LSF/MM/BPF TOPIC] How to make fscache/cachefiles read shaping play nicely with the VM?

David Howells <dhowells@xxxxxxxxxx> · Mon, 09 Dec 2019 14:00:19 +0000

Hi,

I've been rewriting fscache and cachefiles to massively simplify it and make
use of the kiocb interface to do direct-I/O to/from the netfs's pages which
didn't exist when I first did this.  Instead it has been attempting to monitor
the page bit waitqueues to see when the backing filesystem's pages become up
to date.

	https://lore.kernel.org/lkml/24942.1573667720@xxxxxxxxxxxxxxxxxxxxxx/
	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter

To make it more efficient, following other network filesystems implementations
outside of the Linux kernel, I'm attempting to move to requiring reads and
writes to the cache in much bigger granules (fixed at 256KiB initially), which
means that I can represent the presence of a granule of that much data with a
single bit.

So far, I've done this for ->readpages(), ->readpage() and ->write_begin() by
taking the requested page or pages and expanding/contracting the set of pages
as necessary so that the first (or only) actually requested page is in there
and both ends of the sequence are appropriate aligned.

This, however, is at odds with the VM and *its* idea of how to do things -
particularly for ->readpages().  The logic of my fscache_read_helper()[*] is
applied after the VM's readahead logic, and the two don't necessarily see eye
to eye at present.

[*] This is in the patch named "fscache: Add read helper" in the
    above-mentioned git tree and "afs: Use new fscache I/O API" which has
    examples of using it.

There are some things that need to be taken into consideration:

 (1) I might want to make the granule size variable both by file and over the
     length of a file.  So for a file that's, say, <=512MiB in size, I might
     want 1 bit per 256KiB granule, but over 512MiB I might want to switch to
     1 bit per 1MiB granule.  Or for files that large, just use 1MiB granules
     all the way through.

 (2) The granule size might also need vary by cache.

 (3) Some files I want to treat as monolithic.  The file is either all there
     or none of it is.  Examples might be non-regular files such as symlinks
     or directories.

 (4) These parameters might be tunable by the admin.

So how best to make the VM deal with this?  Is it better to integrate such
logic into the VM or leave it on top?

David