Re: Inaccessible pages & folios

Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> · Thu, 15 Apr 2021 11:28:14 +0200



On Mon, 12 Apr 2021 14:55:14 +0100
Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:

[...]

> 
> I was only thinking about the page cache case ...
> 
>         access_ret = arch_make_page_accessible(page);
>         /*
>          * If writeback has been triggered on a page that cannot be
> made
>          * accessible, it is too late to recover here.
>          */
>         VM_BUG_ON_PAGE(access_ret != 0, page);
> 
> ... where it seems all pages _can_ be made accessible.

yes, for that case it is straightforward

> > also, I assume you keep the semantic difference between get_page and
> > pin_page? that's also very important for us  
> 
> I haven't changed anything in gup.c yet.  Just trying to get the page
> cache to suck less right now.

fair enough :)
 
> > > So what you're saying is that the host might allocate, eg a 1GB
> > > folio for a guest, then the guest splits that up into smaller
> > > chunks (eg 1MB), and would only want one of those small chunks
> > > accessible to the hypervisor?  
> > 
> > qemu will allocate a big chunk of memory, and I/O would happen only
> > on small chunks (depending on what the guest does). I don't know
> > how swap and pagecache would behave in the folio scenario.
> > 
> > Also consider that currently we need 4k hardware pages for protected
> > guests (so folios would be ok, as long as they are backed by small
> > pages)
> > 
> > How and when are folios created actually?
> > 
> > is there a way to prevent creation of multi-page folios?  
> 
> Today there's no way to create multi-page folios because I haven't
> submitted the patch to add alloc_folio() and friends:
> 
> https://git.infradead.org/users/willy/pagecache.git/commitdiff/4fe26f7a28ffdc850cd016cdaaa74974c59c5f53
> 
> We do have a way to allocate compound pages and add them to the page
> cache, but that's only in use by tmpfs/shmem.
> 
> What will happen is that (for filesystems which support multipage
> folios), they'll be allocated by the page cache.  I expect other
> places will start to use folios after that (eg anonymous memory), but
> I don't know where all those places will be.  I hope not to be
> involved in that!
> 
> The general principle, though, is that the overhead of tracking
> memory in page-sized units is too high, and we need to use larger
> units by default. There are occasions when we need to do things to
> memory in smaller units, and for those, we can choose to either
> handle sub-folio things, or we can split a folio apart into smaller
> folios.
> 
> > > > a possible approach maybe would be to keep the _page variant,
> > > > and add a _folio wrapper around it    
> > > 
> > > Yes, we can do that.  It's what I'm currently doing for
> > > flush_dcache_folio().  
> > 
> > where would the page flags be stored? as I said, we really depend on
> > that bit to be set correctly to prevent potentially disruptive I/O
> > errors. It's ok if the bit overindicates protection (non-protected
> > pages can be marked as protected), but protected pages must at all
> > times have the bit set.
> > 
> > the reason why this hook exists at all, is to prevent secure pages
> > from being accidentally (or maliciously) fed into I/O  
> 
> You can still use PG_arch_1 on the sub-pages of a folio.  It's one of
> the things you'll have to decide, actually.  Does setting PG_arch_1 on
> the head page of the folio indicate that the entire page is
> accessible, or just that the head page is accessible?  Different page
> flags have made different decisions here.

ok then, I think the simplest and safest thing to do right now is to
keep the flag on each page


in short:
* pagecache -> you can put a loop or introduce a _folio wrapper for
  arch_make_page_accessible
* gup.c -> won't be touched for now, but when the time comes, the
  PG_arch_1 bit should be set for each page