Re: Inaccessible pages & folios

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 12, 2021 at 02:18:09PM +0200, Claudio Imbrenda wrote:
> On Fri, 9 Apr 2021 20:40:59 +0100
> Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > I'm going to change __test_set_page_writeback() to take a folio [3]
> > and now I'm wondering what interface you'd like to use.  My
> > preference would be to rename arch_make_page_accessible() to
> > arch_make_folio_accessible() and pass a folio, at which time you
> > would make the entire folio (however many pages might be in it)
> > accessible.  If you would rather, we can leave the interface as
> > arch_make_page_accessible(), in which case we'll just call it N times
> > in __test_set_page_writeback() (and I won't need to touch gup.c).
> 
> For the rename case, how would you handle gup.c?

At first, I'd turn it into arch_make_folio_accessible(page_folio(page));

Eventually, gup.c needs to become folio-aware.  I haven't spent too much
time thinking about it, but code written like this:

                page = pte_page(pte);
                head = try_grab_compound_head(page, 1, flags);
                if (!head)
                        goto pte_unmap;
                if (unlikely(pte_val(pte) != pte_val(*ptep))) {
                        put_compound_head(head, 1, flags);
                        goto pte_unmap;
                }
                VM_BUG_ON_PAGE(compound_head(page) != head, page);

is just crying out for use of folios.  Also, some of the gup callers
would much prefer to work in terms of folios than individual struct pages
(imagine an RDMA adapter that wants to pin several gigabytes of memory
that's allocated using hugetlbfs for example).

> Consider that arch_make_page_accessible deals (typically) with KVM
> guest pages. Once you bundle up the pages in folios, you can have
> different pages in the same folio with different properties.

So what you're saying is that the host might allocate, eg a 1GB folio
for a guest, then the guest splits that up into smaller chunks (eg 1MB),
and would only want one of those small chunks accessible to the hypervisor?

> In case of failure, you could end up with a folio with some pages
> processed and some not processed. Would you stop at the first error?
> What would the state of the folio be? On s390x we use the PG_arch_1 bit
> to mark secure pages, how would that work with folios?
> 
> and how are fault handlers affected by this folio conversion? would
> they still work on pages, or would that also work on folios? on s390x
> we use the arch_make_page_accessible function in some fault handlers.

Folios can be mapped into userspace at an unaligned offset.  So we still
have to work in pages, at least for now.  We might have some optimised
path for aligned folios later.

> a possible approach maybe would be to keep the _page variant, and add a
> _folio wrapper around it

Yes, we can do that.  It's what I'm currently doing for
flush_dcache_folio().

> for s390x the PG_arch_1 is very important to prevent protected pages
> from being fed to I/O, as in that case Very Bad Things™ would happen.
> 
> sorry for the wall of questions, but I actually like your folio
> approach and I want to understand it better, so we can find a way to
> make everything work well together

Great!

> > PS: The prototype is in gfp.h.  That's not really appropriate; gfp.h
> > is about allocating memory, and this call really has nothing to do
> > with memory allocation.  I think mm.h is a better place for it, if
> > you can't find a better header file than that.
> 
> I had put it there because arch_alloc_page and arch_free_page are also
> there, and the behaviour, from a kernel point of view, is similar
> (unaccessible/unallocated pages will trigger a fault). 
> 
> I actually do not have a preference regarding where the prototype
> lives, as long as everything works. If you think mm.h is more
> appropriate, go for it :)

Heh, I see how you got there from the implementors point of view ;-)
I'll move it ...





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux