On 29.08.22 15:17, Rik van Riel wrote: > On Mon, 2022-08-29 at 12:02 +0200, David Hildenbrand wrote: >> On 26.08.22 23:18, Rik van Riel wrote: >>> On Fri, 2022-08-26 at 12:18 +0200, David Hildenbrand wrote: >>>> On 25.08.22 23:30, alexlzhu@xxxxxx wrote: >>>>> From: Alexander Zhu <alexlzhu@xxxxxx> >>> >>> I could see wanting to maybe consolidate the scanning between >>> KSM and this thing at some point, if it could be done without >>> too much complexity, but keeping this change to split_huge_page >>> looks like it might make sense even when KSM is enabled, since >>> it will get rid of the unnecessary memory much faster than KSM >>> could. >>> >>> Keeping a hundred MB of unnecessary memory around for longer >>> would simply result in more THPs getting split up, and more >>> memory pressure for a longer time than we need. >> >> Right. I was wondering if we want to map the shared zeropage instead >> of >> the "detected to be zero" page, similar to how KSM would do it. For >> example, with userfaultfd there would be an observable difference. >> >> (maybe that's already done in this patch set) >> > The patch does not currently do that, but I suppose it could? > It would be interesting to know why KSM decided to replace the mapped page with the shared zeropage instead of dropping the page and letting the next read fault populate the shared zeropage. That code predates userfaultfd IIRC. > What exactly are the userfaultfd differences here, and how does > dropping 4kB pages break things vs. using the shared zeropage? Once userfaultfd (missing mode) is enabled on a VMA: 1) khugepaged will no longer collapse pte_none(pteval), independent of khugepaged_max_ptes_none setting -- see __collapse_huge_page_isolate. [it will also not collapse zeropages, but I recall that that's not actually required] So it will not close holes, because the user space fault handler is in charge of making a decision when something will get mapped there and with which content. 2) Page faults will no longer populate a THP -- the user space handler is notified instead and has to decide how the fault will be resolved (place pages). If you unmap something (resulting in pte_none()) where previously something used to be mapped in a page table, you might suddenly inform the user space fault handler about a page fault that it doesn't expect, because it previously placed a page and did not zap that page itself (MADV_DONTNEED). So at least with userfaultfd I think we have to be careful. Not sure if there are other corner cases (again, KSM behavior is interesting) -- Thanks, David / dhildenb