On Thu, Feb 02, 2023 at 10:49:38PM +0000, Matthew Wilcox wrote: > On Fri, Feb 03, 2023 at 12:48:58AM +0300, Kirill A. Shutemov wrote: > > On Thu, Feb 02, 2023 at 09:14:23PM +0000, Matthew Wilcox wrote: > > > For those of you not subscribed, linux-mm is currently discussing > > > how best to handle page faults on large folios. I simply made it work > > > when adding large folio support. Now Yin Fengwei is working on > > > making it fast. > > > > > > https://lore.kernel.org/linux-mm/Y9qjn0Y+1ir787nc@xxxxxxxxxxxxxxxxxxxx/ > > > is perhaps the best place to start as it pertains to what the > > > architecture will see. > > > > > > At the bottom of that function, I propose > > > > > > + for (i = 0; i < nr; i++) { > > > + set_pte_at(vma->vm_mm, addr, vmf->pte + i, entry); > > > + /* no need to invalidate: a not-present page won't be cached */ > > > + update_mmu_cache(vma, addr, vmf->pte + i); > > > + addr += PAGE_SIZE; > > > + entry = pte_next(entry); > > > + } > > > > > > (or I would have, had I not forgotten that pte_t isn't an integral type) > > > > > > But I think that some architectures want to mark PTEs specially for > > > "This is part of a contiguous range" -- ARM, perhaps? So would you like > > > an API like: > > > > > > arch_set_ptes(mm, addr, vmf->pte, entry, nr); > > > > Maybe just set_ptes(). arch_ doesn't contribute much. > > Sure. > > > > update_mmu_cache_range(vma, addr, vmf->pte, nr); > > > > > > There are some challenges here. For example, folios may be mapped > > > askew (ie not naturally aligned). Another problem is that folios may > > > be unmapped in part (eg mmap(), fault, followed by munmap() of one of > > > the pages in the folio), and I presume you'd need to go and unmark the > > > other PTEs in that case. So it's not as simple as just checking whether > > > 'addr' and 'nr' are in some way compatible. > > > > I think the key question is who is responsible for 'nr' being safe. Like > > is it caller or set_ptes() need to check that it belong to the same PTE > > page table, folio, VMA, etc. > > > > I think it has to be done by caller and set_pte() has to be as simple as > > possible. > > Caller guarantees that 'nr' is bounded by all of (vma, PMD table, folio). Also caller is responsible for taking all relevant locks. > We don't currently allocate folios larger than PMD size, but perhaps we > should prepare for that and as part of this same exercise define > > set_pmds(mm, addr, vmf->pmd, entry, nr); > > ... where 'nr' is the number of PMDs to set, not number of pages. Sounds good to me. -- Kiryl Shutsemau / Kirill A. Shutemov