Re: API for setting multiple PTEs at once

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 03, 2023 at 12:48:58AM +0300, Kirill A. Shutemov wrote:
> On Thu, Feb 02, 2023 at 09:14:23PM +0000, Matthew Wilcox wrote:
> > For those of you not subscribed, linux-mm is currently discussing
> > how best to handle page faults on large folios.  I simply made it work
> > when adding large folio support.  Now Yin Fengwei is working on
> > making it fast.
> > 
> > https://lore.kernel.org/linux-mm/Y9qjn0Y+1ir787nc@xxxxxxxxxxxxxxxxxxxx/
> > is perhaps the best place to start as it pertains to what the
> > architecture will see.
> > 
> > At the bottom of that function, I propose
> > 
> > +       for (i = 0; i < nr; i++) {
> > +               set_pte_at(vma->vm_mm, addr, vmf->pte + i, entry);
> > +               /* no need to invalidate: a not-present page won't be cached */
> > +               update_mmu_cache(vma, addr, vmf->pte + i);
> > +               addr += PAGE_SIZE;
> > +		entry = pte_next(entry);
> > +	}
> > 
> > (or I would have, had I not forgotten that pte_t isn't an integral type)
> > 
> > But I think that some architectures want to mark PTEs specially for
> > "This is part of a contiguous range" -- ARM, perhaps?  So would you like
> > an API like:
> > 
> > 	arch_set_ptes(mm, addr, vmf->pte, entry, nr);
> 
> Maybe just set_ptes(). arch_ doesn't contribute much.

Sure.

> > 	update_mmu_cache_range(vma, addr, vmf->pte, nr);
> > 
> > There are some challenges here.  For example, folios may be mapped
> > askew (ie not naturally aligned).  Another problem is that folios may
> > be unmapped in part (eg mmap(), fault, followed by munmap() of one of
> > the pages in the folio), and I presume you'd need to go and unmark the
> > other PTEs in that case.  So it's not as simple as just checking whether
> > 'addr' and 'nr' are in some way compatible.
> 
> I think the key question is who is responsible for 'nr' being safe. Like
> is it caller or set_ptes() need to check that it belong to the same PTE
> page table, folio, VMA, etc.
> 
> I think it has to be done by caller and set_pte() has to be as simple as
> possible.

Caller guarantees that 'nr' is bounded by all of (vma, PMD table, folio).

We don't currently allocate folios larger than PMD size, but perhaps we
should prepare for that and as part of this same exercise define

	set_pmds(mm, addr, vmf->pmd, entry, nr);

... where 'nr' is the number of PMDs to set, not number of pages.



[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux