On Thu, Jul 11, 2024 at 02:15:38AM +0200, David Hildenbrand wrote: > > > (as a side note, cont-pte/cont-pmd should primarily be a hint from arch code > > > on how many entries we can batch, like we do in folio_pte_batch(); point is > > > that we want to batch also on architectures where we don't have such bits, > > > and prepare for architectures that implement various sizes of batching; > > > IMHO, having cont-pte/cont-pmd checks in common code is likely the wrong > > > approach. Again, folio_pte_batch() is where we tackled the problem > > > differently from the THP perspective) > > > > I must say I did not check folio_pte_batch() and I am totally ignorant > > of what/how it does things. > > I will have a look. > > > > > I have an idea for a better page table walker API that would try batching > > > most entries (under one PTL), and walkers can just register for the types > > > they want. Hoping I will find some time to at least scetch the user > > > interface soon. > > > > > > That doesn't mean that this should block your work, but the > > > cont-pte/cont/pmd hugetlb stuff is really nasty to handle here, and I don't > > > particularly like where this is going. > > > > Ok, let me take a step back then. > > Previous versions of that RFC did not handle cont-{pte-pmd} wide in the > > open, so let me go back to the drawing board and come up with something > > that does not fiddle with cont- stuff in that way. > > > > I might post here a small diff just to see if we are on the same page. > > > > As usual, thanks a lot for your comments David! > > Feel free to reach out to discuss ways forward. I think we should > > (a) move to the automatic cont-pte setting as done for THPs via > set_ptes(). > (b) Batching PTE updates at all relevant places, so we get no change in > behavior: cont-pte bit will remain set. > (c) Likely remove the use of cont-pte bits in hugetlb code for anything > that is not a present folio (i.e., where automatic cont-pte bit > setting would never set it). Migration entries might require > thought (we can easily batch to achieve the same thing, but the > behavior of hugetlb likely differs to the generic way of handling > migration entries on multiple ptes: reference the folio vs. > the respective subpages of the folio). Uhm, I see, but I am bit confused. Although related, this seems orthogonal to this series and more like for a next-thing to do, right? It is true that this series tries to handle cont-{pmd,pte} in the pagewalk api for hugetlb vmas, but in order to raise less eye brows I can come up with a way not to do that for now, so we do not fiddle with cont-stuff in this series. Or am I misunderstanding you? -- Oscar Salvador SUSE Labs