On Wed, Jul 10, 2024 at 05:52:43AM +0200, David Hildenbrand wrote: > I understand that. And it would all be easier+more straight forward if we > wouldn't have that hugetlb CONT-PTE / CONT-PMD stuff in there that works > similar, but different to "ordinary" cont-pte for thp. > > I'm sure you stumbled over the set_huge_pte_at() on arm64 for example. If > we, at one point *don't* use these hugetlb functions right now to modify > hugetlb entries, we might be in trouble. > > That's why I think we should maybe invest our time and effort in having a > new pagewalker that will just batch such things naturally, and users that > can operate on that naturally. For example: a hugetlb cont-pte-mapped folio > will just naturally be reported as a "fully mapped folio", just like a THP > would be if mapped in a compatible way. > > Yes, this requires more work, but as raised in some patches here, working on > individual PTEs/PMDs for hugetlb is problematic. > > You have to batch every operation, to essentially teach ordinary code to do > what the hugetlb_* special code would have done on cont-pte/cont-pmd things. > > > (as a side note, cont-pte/cont-pmd should primarily be a hint from arch code > on how many entries we can batch, like we do in folio_pte_batch(); point is > that we want to batch also on architectures where we don't have such bits, > and prepare for architectures that implement various sizes of batching; > IMHO, having cont-pte/cont-pmd checks in common code is likely the wrong > approach. Again, folio_pte_batch() is where we tackled the problem > differently from the THP perspective) I must say I did not check folio_pte_batch() and I am totally ignorant of what/how it does things. I will have a look. > I have an idea for a better page table walker API that would try batching > most entries (under one PTL), and walkers can just register for the types > they want. Hoping I will find some time to at least scetch the user > interface soon. > > That doesn't mean that this should block your work, but the > cont-pte/cont/pmd hugetlb stuff is really nasty to handle here, and I don't > particularly like where this is going. Ok, let me take a step back then. Previous versions of that RFC did not handle cont-{pte-pmd} wide in the open, so let me go back to the drawing board and come up with something that does not fiddle with cont- stuff in that way. I might post here a small diff just to see if we are on the same page. As usual, thanks a lot for your comments David! -- Oscar Salvador SUSE Labs