On Wed, Aug 07, 2019 at 04:32:51PM +0100, Steven Price wrote: > On 07/08/2019 15:56, Matthew Wilcox wrote: > > On Wed, Aug 07, 2019 at 03:30:38PM +0100, Steven Price wrote: > >> On 07/08/2019 15:15, Matthew Wilcox wrote: > >>> On Tue, Aug 06, 2019 at 11:40:00PM -0700, Christoph Hellwig wrote: > >>>> On Tue, Aug 06, 2019 at 12:09:38PM -0700, Matthew Wilcox wrote: > >>>>> Has anyone looked at turning the interface inside-out? ie something like: > >>>>> > >>>>> struct mm_walk_state state = { .mm = mm, .start = start, .end = end, }; > >>>>> > >>>>> for_each_page_range(&state, page) { > >>>>> ... do something with page ... > >>>>> } > >>>>> > >>>>> with appropriate macrology along the lines of: > >>>>> > >>>>> #define for_each_page_range(state, page) \ > >>>>> while ((page = page_range_walk_next(state))) > >>>>> > >>>>> Then you don't need to package anything up into structs that are shared > >>>>> between the caller and the iterated function. > >>>> > >>>> I'm not an all that huge fan of super magic macro loops. But in this > >>>> case I don't see how it could even work, as we get special callbacks > >>>> for huge pages and holes, and people are trying to add a few more ops > >>>> as well. > >>> > >>> We could have bits in the mm_walk_state which indicate what things to return > >>> and what things to skip. We could (and probably should) also use different > >>> iterator names if people actually want to iterate different things. eg > >>> for_each_pte_range(&state, pte) as well as for_each_page_range(). > >>> > >> > >> The iterator approach could be awkward for the likes of my generic > >> ptdump implementation[1]. It would require an iterator which returns all > >> levels and allows skipping levels when required (to prevent KASAN > >> slowing things down too much). So something like: > >> > >> start_walk_range(&state); > >> for_each_page_range(&state, page) { > >> switch(page->level) { > >> case PTE: > >> ... > >> case PMD: > >> if (...) > >> skip_pmd(&state); > >> ... > >> case HOLE: > >> .... > >> ... > >> } > >> } > >> end_walk_range(&state); > >> > >> It seems a little fragile - e.g. we wouldn't (easily) get type checking > >> that you are actually treating a PTE as a pte_t. The state mutators like > >> skip_pmd() also seem a bit clumsy. > > > > Once you're on-board with using a state structure, you can use it in all > > kinds of fun ways. For example: > > > > struct mm_walk_state { > > struct mm_struct *mm; > > unsigned long start; > > unsigned long end; > > unsigned long curr; > > p4d_t p4d; > > pud_t pud; > > pmd_t pmd; > > pte_t pte; > > enum page_entry_size size; > > int flags; > > }; > > > > For this user, I'd expect something like ... > > > > DECLARE_MM_WALK_FLAGS(state, mm, start, end, > > MM_WALK_HOLES | MM_WALK_ALL_SIZES); > > > > walk_each_pte(state) { > > switch (state->size) { > > case PE_SIZE_PTE: > > ... > > case PE_SIZE_PMD: > > if (...(state->pmd)) > > continue; > > You need to be able to signal whether you want to descend into the PMD > or skip the entire part of the tree. This was my skip_pmd() function above. Do you? My assumption was that if there's a PMD entry, you either want to be called once for the entire PMD entry, or 512 times for each PTE entry that would be in the PMD if it hadn't been collapsed, and you could indicate this through a flag in the state. Is it more dynamic than that for some users? In any case, we could have a skip_pmd(&state) function; I'm just not sure we need it. > > ... > > } > > } > > > > There's no need to have start / end walk function calls. > > You've got a start walk function (it's your DECLARE_MM_WALK_FLAGS > above). The end walk I agree I think you don't actually need it since > struct mm_walk_state contains all the state. Ah, I misunderstood what you meant.