Re: API for setting multiple PTEs at once

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 02, 2023 at 10:49:38PM +0000, Matthew Wilcox wrote:
> On Fri, Feb 03, 2023 at 12:48:58AM +0300, Kirill A. Shutemov wrote:
> > On Thu, Feb 02, 2023 at 09:14:23PM +0000, Matthew Wilcox wrote:
> > > For those of you not subscribed, linux-mm is currently discussing
> > > how best to handle page faults on large folios.  I simply made it work
> > > when adding large folio support.  Now Yin Fengwei is working on
> > > making it fast.
> > > 
> > > https://lore.kernel.org/linux-mm/Y9qjn0Y+1ir787nc@xxxxxxxxxxxxxxxxxxxx/
> > > is perhaps the best place to start as it pertains to what the
> > > architecture will see.
> > > 
> > > At the bottom of that function, I propose
> > > 
> > > +       for (i = 0; i < nr; i++) {
> > > +               set_pte_at(vma->vm_mm, addr, vmf->pte + i, entry);
> > > +               /* no need to invalidate: a not-present page won't be cached */
> > > +               update_mmu_cache(vma, addr, vmf->pte + i);
> > > +               addr += PAGE_SIZE;
> > > +		entry = pte_next(entry);
> > > +	}
> > > 
> > > (or I would have, had I not forgotten that pte_t isn't an integral type)
> > > 
> > > But I think that some architectures want to mark PTEs specially for
> > > "This is part of a contiguous range" -- ARM, perhaps?  So would you like
> > > an API like:
> > > 
> > > 	arch_set_ptes(mm, addr, vmf->pte, entry, nr);
> > 
> > Maybe just set_ptes(). arch_ doesn't contribute much.
> 
> Sure.
> 
> > > 	update_mmu_cache_range(vma, addr, vmf->pte, nr);
> > > 
> > > There are some challenges here.  For example, folios may be mapped
> > > askew (ie not naturally aligned).  Another problem is that folios may
> > > be unmapped in part (eg mmap(), fault, followed by munmap() of one of
> > > the pages in the folio), and I presume you'd need to go and unmark the
> > > other PTEs in that case.  So it's not as simple as just checking whether
> > > 'addr' and 'nr' are in some way compatible.
> > 
> > I think the key question is who is responsible for 'nr' being safe. Like
> > is it caller or set_ptes() need to check that it belong to the same PTE
> > page table, folio, VMA, etc.
> > 
> > I think it has to be done by caller and set_pte() has to be as simple as
> > possible.
> 
> Caller guarantees that 'nr' is bounded by all of (vma, PMD table, folio).

Also caller is responsible for taking all relevant locks.

> We don't currently allocate folios larger than PMD size, but perhaps we
> should prepare for that and as part of this same exercise define
> 
> 	set_pmds(mm, addr, vmf->pmd, entry, nr);
> 
> ... where 'nr' is the number of PMDs to set, not number of pages.

Sounds good to me.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov



[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux