On Thu, Feb 02, 2023 at 09:14:23PM +0000, Matthew Wilcox wrote: > For those of you not subscribed, linux-mm is currently discussing > how best to handle page faults on large folios. I simply made it work > when adding large folio support. Now Yin Fengwei is working on > making it fast. > > https://lore.kernel.org/linux-mm/Y9qjn0Y+1ir787nc@xxxxxxxxxxxxxxxxxxxx/ > is perhaps the best place to start as it pertains to what the > architecture will see. > > At the bottom of that function, I propose > > + for (i = 0; i < nr; i++) { > + set_pte_at(vma->vm_mm, addr, vmf->pte + i, entry); > + /* no need to invalidate: a not-present page won't be cached */ > + update_mmu_cache(vma, addr, vmf->pte + i); > + addr += PAGE_SIZE; > + entry = pte_next(entry); > + } > > (or I would have, had I not forgotten that pte_t isn't an integral type) > > But I think that some architectures want to mark PTEs specially for > "This is part of a contiguous range" -- ARM, perhaps? So would you like > an API like: > > arch_set_ptes(mm, addr, vmf->pte, entry, nr); Maybe just set_ptes(). arch_ doesn't contribute much. > update_mmu_cache_range(vma, addr, vmf->pte, nr); > > There are some challenges here. For example, folios may be mapped > askew (ie not naturally aligned). Another problem is that folios may > be unmapped in part (eg mmap(), fault, followed by munmap() of one of > the pages in the folio), and I presume you'd need to go and unmark the > other PTEs in that case. So it's not as simple as just checking whether > 'addr' and 'nr' are in some way compatible. I think the key question is who is responsible for 'nr' being safe. Like is it caller or set_ptes() need to check that it belong to the same PTE page table, folio, VMA, etc. I think it has to be done by caller and set_pte() has to be as simple as possible. -- Kiryl Shutsemau / Kirill A. Shutemov