On Fri 02-02-24 21:46:45, Lance Yang wrote: > Here is a part from the man page explaining > the MADV_FREE semantics: > > The kernel can thus free thesepages, but the > freeing could be delayed until memory pressure > occurs. For each of the pages that has been > marked to be freed but has not yet been freed, > the free operation will be canceled if the caller > writes into the page. If there is no subsequent > write, the kernel can free the pages at any time. > > IIUC, if there is no subsequent write, lazyfree > pages will eventually be reclaimed. If there is no memory pressure then this might not ever happen. User cannot make any assumption about their content once madvise call has been done. The content has to be considered lost. Sure the userspace might have means to tell those pages from zero pages and recheck after the write but that is about it. > khugepaged > treats lazyfree pages the same as pte_none, > avoiding copying them to the new huge page > during collapse. It seems that lazyfree pages > are reclaimed before khugepaged collapses them. > This aligns with user expectations. > > However, IMO, if the content of MADV_FREE pages > remains valid during collapse, then khugepaged > treating lazyfree pages the same as pte_none > might not be suitable. Why? Unless I am missing something (which is possible of course) I do not really see why dropping the content of those pages and replacing them with a THP is any difference from reclaiming those pages and then faulting in a non-THP zero page. Now, if khugepaged reused the original content of MADV_FREE pages that would be a slightly different story. I can see why users would expect zero pages to back madvised area. -- Michal Hocko SUSE Labs