Re: [LSF/MM/BPF TOPIC] Page table manipulation primitives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 07, 2020 at 08:45:53PM +0300, Kirill A. Shutemov wrote:
> On Thu, Feb 06, 2020 at 09:34:10AM -0800, Matthew Wilcox wrote:
> > On Thu, Feb 06, 2020 at 06:57:41PM +0200, Mike Rapoport wrote:
> > > While updating the architectures to properly use 5-level folded page tables
> > > without <asm-generic/?level-fixup.h> and <asm-generic/pgtable-nop4d-hack.h>
> > > I wondered if we can do better than explicitly name each and every level of
> > > the page table, open-code traversal of all the layers numerous times and
> > > have copied do_something_pXd_range().
> > > 
> > > Then I've come across Kirill's "Proof-of-concept: better(?) page-table
> > > manipulation API" [1], but as far as I could see there was no progress
> > > since then.
> > > 
> > > I'd like to resurrect the topic and try to see if we can come up with
> > > actually better page table manipulation API.
> > > 
> > > [1] https://lore.kernel.org/lkml/20180424154355.mfjgkf47kdp2by4e@xxxxxxxxxxxxxxxxxx/
> 
> I played a bit more with it after that, but got distracted to other stuff.
> I'll see if I'll be able to come up with an update.
> 
> > I don't think this approach helps support 64k pages on ARM
> 
> Could you specify what such support would require?

For 64kB pages with a base 4kB page size, you set a special bit in 16 adjacent
aligned PTEs.  When the MMU sees that bit set, it uses a 64k TLB entry.  So
I think what we want for a fully generic interface is:

void set_vpte_at(struct mm_struct *, unsigned long addr, vpte_iter *, vpte_t,
		unsigned int order);

(maybe we don't need an 'order' here; perhaps it's embedded in the vpte_iter)

> > , for example,
> > so it doesn't solve enough problems to be worth doing.  I'd favour
> > an interface which looked more like this:
> > 
> > 	vpte_iter iter;
> > 	vpte_t vpte;
> > 
> > 	vpte_iter_for_each(vpte, iter, start, end, flags) {
> > 		unsigned char order = vpte_order(&iter);
> > 		... do things based on vpte and order ...
> > 	}
> 
> It looks like just an higher level API that can be provided over my
> approach. Maybe it should be the default go-to. But I find it useful to be
> able go into low-level details where it is matters.

I think the key difference is that I would not embed the 'order' in the
vpte, but keep it in the iter.  I don't know that every architecture has
the ability to tell from a union { pte_t, pmd_t, pud_t, p4d_t, pgd_t }
which of the levels it is.

Looking at the code you provided, another difference is that your method
involves a recursive call for each level of the page tables.  I'd rather
express these kinds of things as "I would like to iterate over each
page table entry in this range" than "Have I got to the bottom?  If not,
recursively call myself".  IOW vpte_iter_for_each() would work its way
down to the lowest level, and keep track of where it is in the iter,
so when moving to the next entry in the tree, it knows whether to go up
before going sideways, and then down as far as it needs to.

Whatever we come up with, we should be able to collapse away the levels
which aren't needed, and support whatever non-PTE-level TLB orders the
hardware supports without forcing support for those orders on x86 code.

I don't have a good solution for how to express the 'copy_pt_range' in
your example, where we need to iterate two mms at the same time.  Maybe
that's a special iterator which does exactly that.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux