On Fri, Feb 03, 2023 at 12:48:58AM +0300, Kirill A. Shutemov wrote: > On Thu, Feb 02, 2023 at 09:14:23PM +0000, Matthew Wilcox wrote: > > For those of you not subscribed, linux-mm is currently discussing > > how best to handle page faults on large folios. I simply made it work > > when adding large folio support. Now Yin Fengwei is working on > > making it fast. > > > > https://lore.kernel.org/linux-mm/Y9qjn0Y+1ir787nc@xxxxxxxxxxxxxxxxxxxx/ > > is perhaps the best place to start as it pertains to what the > > architecture will see. > > > > At the bottom of that function, I propose > > > > + for (i = 0; i < nr; i++) { > > + set_pte_at(vma->vm_mm, addr, vmf->pte + i, entry); > > + /* no need to invalidate: a not-present page won't be cached */ > > + update_mmu_cache(vma, addr, vmf->pte + i); > > + addr += PAGE_SIZE; > > + entry = pte_next(entry); > > + } > > > > (or I would have, had I not forgotten that pte_t isn't an integral type) > > > > But I think that some architectures want to mark PTEs specially for > > "This is part of a contiguous range" -- ARM, perhaps? So would you like > > an API like: > > > > arch_set_ptes(mm, addr, vmf->pte, entry, nr); > > Maybe just set_ptes(). arch_ doesn't contribute much. Sure. > > update_mmu_cache_range(vma, addr, vmf->pte, nr); > > > > There are some challenges here. For example, folios may be mapped > > askew (ie not naturally aligned). Another problem is that folios may > > be unmapped in part (eg mmap(), fault, followed by munmap() of one of > > the pages in the folio), and I presume you'd need to go and unmark the > > other PTEs in that case. So it's not as simple as just checking whether > > 'addr' and 'nr' are in some way compatible. > > I think the key question is who is responsible for 'nr' being safe. Like > is it caller or set_ptes() need to check that it belong to the same PTE > page table, folio, VMA, etc. > > I think it has to be done by caller and set_pte() has to be as simple as > possible. Caller guarantees that 'nr' is bounded by all of (vma, PMD table, folio). We don't currently allocate folios larger than PMD size, but perhaps we should prepare for that and as part of this same exercise define set_pmds(mm, addr, vmf->pmd, entry, nr); ... where 'nr' is the number of PMDs to set, not number of pages.