On Wed, Jan 18, 2023 at 10:23:02PM +0100, Michal Hocko wrote: > On Wed 18-01-23 10:07:17, Minchan Kim wrote: > > On Wed, Jan 18, 2023 at 06:35:32PM +0100, Michal Hocko wrote: > > > On Wed 18-01-23 09:09:36, Minchan Kim wrote: > > > > On Wed, Jan 18, 2023 at 10:10:44AM +0100, Michal Hocko wrote: > > > > > On Tue 17-01-23 15:16:30, Minchan Kim wrote: > > > > > > The reclaim_pages MADV_PAGEOUT uses needs to return the number of > > > > > > pages paged-out successfully, not only the number of reclaimed pages > > > > > > in the operation because those pages paged-out successfully will be > > > > > > reclaimed easily at the memory pressure due to asynchronous writeback > > > > > > rotation(i.e., PG_reclaim with folio_rotate_reclaimable). > > > > > > > > > > > > This patch renames the reclaim_pages with paging_out(with hope that > > > > > > it's clear from operation point of view) and then adds a additional > > > > > > stat in reclaim_stat to represent the number of paged-out but kept > > > > > > in the memory for rotation on writeback completion. > > > > > > > > > > > > With that stat, madvise_pageout can know how many pages were paged-out > > > > > > successfully as well as reclaimed. The return value will be used for > > > > > > statistics in next patch. > > > > > > > > > > I really fail to see the reson for the rename and paging_out doesn't > > > > > even make much sense as a name TBH. > > > > > > > > Currently, what we are doing to reclaim memory is > > > > > > > > reclaim_folio_list > > > > shrink_folio_list > > > > if (folio_mapped(folio)) > > > > try_to_unmap(folio) > > > > > > > > if (folio_test_dirty(folio)) > > > > pageout > > > > > > > > Based on the structure, pageout is just one of way to reclaim memory. > > > > > > > > With MADV_PAGEOUT, what user want to know how many pages > > > > were paged out as they requested(from userspace PoV, how many times > > > > pages fault happens in future accesses), not the number of reclaimed > > > > pages shrink_folio_list returns currently. > > > > > > > > In the sense, I wanted to distinguish between reclaim and pageout. > > > > > > But MADV_PAGEOUT is documented to trigger memory reclaim in general > > > not a pageout. Let me quote from the man page > > > : Reclaim a given range of pages. This is done to free up memory occupied > > > : by these pages. > > > > IMO, we need to change the documentation something like this. > > > > : Try to reclaim a given range of pages. The reclaim carries on the > > unmap pages from address space and then write them out to backing > > storage. It could help to free up memory occupied by these pages > > or improve memory reclaim efficiency. > > But this is not what the implementation does nor should it be specific > about what reclaim actual can do. The specific implementation of the > reclaim is an implementation detail. > > > > Sure anonymous pages can be paged out to the swap storage but with the > > > upcomming multi-tiering it can be also "paged out" to a lower tier. All > > > that leads to freeing up memory that is currently mapped by that address > > > range. > > > > I am not familiar with multi-tiering. However, thing is the operation > > of pageout is synchronous or not. If it's synchronous(IOW, when the > > pageout returns, the page was really written to the storage), yes, > > it can reclaim memory. If the backing storage is asynchrnous device > > (which is *major* these days), we cannot reclaim the memory but just > > wrote the page to the storage with hope it could help reclaim speed > > at next iteration of reclaim. > > I am sorry but I do not follow. Synchronicity of the reclaim should be > completely irrelevant. Even swapout (pageout from your POV AFAIU) can be > async or sync. > > > > Anyway, what do you actually meen by distinguishing between reclaim and > > > pageout. Aren't those just two names for the same thing? > > > > reclaim is realy memory freeing but pageout is just one of the way > > to achieve the memory freeing, which is not guaranteed depending on > > backing storage's speed. > > Try to think about it some more. Do you really want the MADV_PAGEOUT to > be so specific about how the memory reclaim is achieved? How do you > reflect new ways of reclaiming memory - e.g. memory demotion when the > primary memory gets freed by migrating the content to a slower type of > memory yet not write it out to ultra slow swap storage (which is just > yet another tier that cannot be accessed directly without an explicit > IO)? I understand your concern now and believe better implementation would account the number of virtual address scanning and the number of page *unmapped from page table* so we don't need to worry what types of paging out happens(e.g., write it to slower storage or demote it to lower tier. In the end, userspace will see the paging in, anyway.) "Unmapped the page from page table and demotes the page to secondary device. User would see page fault when the next access happen" If you agree it, yeah, I don't need to change anything in vmscan.c. Instead, I could do everything in madvise.c Let me know if you have other concern or suggestion. Thanks, Michal.