Re: [LSF/MM/BPF TOPIC] Dropping page cache of individual fs

Christian Brauner <brauner@xxxxxxxxxx> · Wed, 17 Jan 2024 13:53:20 +0100

On Tue, Jan 16, 2024 at 12:45:19PM +0100, Jan Kara wrote:
> On Tue 16-01-24 11:50:32, Christian Brauner wrote:
> 
> <snip the usecase details>
> 
> > My initial reaction is to give userspace an API to drop the page cache
> > of a specific filesystem which may have additional uses. I initially had
> > started drafting an ioctl() and then got swayed towards a
> > posix_fadvise() flag. I found out that this was already proposed a few
> > years ago but got rejected as it was suspected this might just be
> > someone toying around without a real world use-case. I think this here
> > might qualify as a real-world use-case.
> > 
> > This may at least help securing users with a regular dm-crypt setup
> > where dm-crypt is the top layer. Users that stack additional layers on
> > top of dm-crypt may still leak plaintext of course if they introduce
> > additional caching. But that's on them.
> 
> Well, your usecase has one substantial difference from drop_caches. You
> actually *require* pages to be evicted from the page cache for security
> purposes. And giving any kind of guarantees is going to be tough. Think for
> example when someone grabs page cache folio reference through vmsplice(2),
> then you initiate your dmSuspend and want to evict page cache. What are you
> going to do? You cannot free the folio while the refcount is elevated, you
> could possibly detach it from the page cache so it isn't at least visible
> but that has side effects too - after you resume the folio would remain
> detached so it will not see changes happening to the file anymore. So IMHO
> the only thing you could do without problematic side-effects is report
> error. Which would be user unfriendly and could be actually surprisingly
> frequent due to trasient folio references taken by various code paths.

I wonder though, if you start suspending userspace and the filesystem
how likely are you to encounter these transient errors?

> 
> Sure we could report error only if the page has pincount elevated, not only
> refcount, but it needs some serious thinking how this would interact.
> 
> Also what is going to be the interaction with mlock(2)?
> 
> Overall this doesn't seem like "just tweak drop_caches a bit" kind of
> work...

So when I talked to the Gnome people they were interested in an optimal
or a best-effort solution. So returning an error might actually be useful.

I'm specifically put this here because my knowledge of the page cache
isn't sufficient to make a judgement what guarantees are and aren't
feasible. So I'm grateful for any insight here.