On Tue, Feb 04, 2025 at 09:40:16PM +0100, David Hildenbrand wrote: > Unfortunately not that easy. You can only have parts of a large folio > mapped. It would be PAGE_SIZE * nr_ptes when batching. I see, you are right. > I think this is the wrong approach. We should replace that pagewalk API > usage by something better that hides all the batching. > > The interface would look similar to folio_walk (no callbacks, handling of > locking), but (a) work on ranges; (b) work also on non-folio entries; and > (c) batch all suitable entries in the range. > > Something like a pt_range_walk_start() that returns a "type" (folio range, > migration entries, swap range, ...) + stores other details (range, level, > ptep, ...) in a structure like folio_walk, to then provide mechanisms to > continue (pt_walk_continue()) to walk or abort (pt_walk_done()) it. Similar > to page_vma_mapped_walk(), but not specific to a given page/folio. > > Then, we would simply process the output of that. With the hope that, for > hugetlb it will just batch all cont-pte / cont-pmd entries into a single > return value. > > That will make the R/O walking as in task_mmu.c easier, hopefully. > > Not so much with PTE/PMD modifications, like damon_mkold_ops ... :( But > maybe that just has to be special-cased for hugetlb, somehow ... Ok, let me see if we are on the same page. You are basically saying that we should replace the existing pagewalk API with something similar to what you described above. I have to confess that when you first mentioned this back in July when I posted the RFC, I felt dishearted, because it implies an even bigger surgery. But having felt the mess that dealing with cont-{pmd,pud}s, and the inability of the existing API to do that in a clean way (without having to teach each and every function about that if needed), maybe it is the only way to do this 1) right and 2) clean. I thought that maybe we can get away and to the batching somehow before calling in the callbacks e.g: at walk_{pud,pmd,pte}_range level, but I am not sure whether 1) that is possible and 2) how ugly it would look. So, given that this is not a really urgent matter, something that needs to be fixed asap, maybe the way to go is to create an API that can deal with all that, abstracting all these details. I am willing to take a shot on this, if we are clear that it makes sense to pursue this road. -- Oscar Salvador SUSE Labs