Replacing walk_page_range

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sid was incautious enough to say he'd like to take on fixing
walk_page_range() so that hugetlb isn't treated specially.  This is
going to subject him to one of my rants, so I thought I'd share with
everyone before we meet to talk about it later today.

1. I dislike the callback approach.  Indirect function calls are not
cheap (thanks Spectre!) and it forces separation of code into two
functions, often necessitating some awkward passing of state between
them through the mm_walk->private void pointer.

2. If you want to support PUDs, and the page tables happen to contain PTEs,
you get passed a PUD, even though you need to do the ACTION_CONTINUE.

3. There's separate handling for hugetlb, even though there really
shoudn't be.

4. It's not used everywhere.  unmap_page_range() opencodes the page
table walk.  It's so hard to use that ptdump_walk_pgd() wraps it
and has its own callback!

5. There's no help for systems with cont_pte/cont_pmd.  You have to
manage that yourself.


I think a new interface looks like ...

	struct folio *folio;
	PAGEWALK(my_walk, mm, start, end);

	pagewalk_for_each_folio(folio, &my_walk) {
		... do things with folio ...
	}

but I haven't spent enough time looking at all the consumers to be
sure that it'll work.  I think there are other consumers that want
to do things that aren't per-folio iterations, such as

	pagewalk_for_each_pfn(pfn, &my_walk) {
		... do things with pfn ...
	}

I think the pre-vma, post-vma hooks can be replaced by remembering what
the prev vma was, and doing whatever is needed if current vma is
different from prev vma.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux