Hello Hugh, On Sat, Apr 11, 2015 at 02:40:46PM -0700, Hugh Dickins wrote: > On Wed, 11 Mar 2015, Minchan Kim wrote: > > > Bascially, MADV_FREE relys on the pte dirty to decide whether > > it allows VM to discard the page. However, if there is swap-in, > > pte pointed out the page has no pte_dirty. So, MADV_FREE checks > > PageDirty and PageSwapCache for those pages to not discard it > > because swapped-in page could live on swap cache or PageDirty > > when it is removed from swapcache. > > > > The problem in here is that anonymous pages can have PageDirty if > > it is removed from swapcache so that VM cannot parse those pages > > as freeable even if we did madvise_free. Look at below example. > > > > ptr = malloc(); > > memset(ptr); > > .. > > heavy memory pressure -> swap-out all of pages > > .. > > out of memory pressure so there are lots of free pages > > .. > > var = *ptr; -> swap-in page/remove the page from swapcache. so pte_clean > > but SetPageDirty > > > > madvise_free(ptr); > > .. > > .. > > heavy memory pressure -> VM cannot discard the page by PageDirty. > > > > PageDirty for anonymous page aims for avoiding duplicating > > swapping out. In other words, if a page have swapped-in but > > live swapcache(ie, !PageDirty), we could save swapout if the page > > is selected as victim by VM in future because swap device have > > kept previous swapped-out contents of the page. > > > > So, rather than relying on the PG_dirty for working madvise_free, > > pte_dirty is more straightforward. Inherently, swapped-out page was > > pte_dirty so this patch restores the dirtiness when swap-in fault > > happens so madvise_free doesn't rely on the PageDirty any more. > > > > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > > Cc: Cyrill Gorcunov <gorcunov@xxxxxxxxx> > > Cc: Pavel Emelyanov <xemul@xxxxxxxxxxxxx> > > Reported-by: Yalin Wang <yalin.wang@xxxxxxxxxxxxxx> > > Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx> > > Sorry, but NAK to this patch, > mm-make-every-pte-dirty-on-do_swap_page.patch in akpm's mm tree > (I hope it hasn't reached linux-next yet). > > You may well be right that pte_dirty<->PageDirty can be handled > differently, in a way more favourable to MADV_FREE. And this patch > may be a step in the right direction, but I've barely given it thought. > > As it stands, it segfaults more than any patch I've seen in years: > I just tried applying it to 4.0-rc7-mm1, and running kernel builds > in low memory with swap. Even if I leave KSM out, and memcg out, and > swapoff out, and THP out, and tmpfs out, it still SIGSEGVs very soon. > > I have a choice: spend a few hours tracking down the errors, and > post a fix patch on top of yours? But even then I'd want to spend > a lot longer thinking through every dirty/Dirty in the source before > I'd feel comfortable to give an ack. > > This is users' data, and we need to be very careful with it: errors > in MADV_FREE are one thing, for now that's easy to avoid; but in this > patch you're changing the rules for Anon PageDirty for everyone. > > I think for now I'll have to leave it to you to do much more source > diligence and testing, before coming back with a corrected patch for > us then to review, slowly and carefully. Sorry for my bad. I will keep your advise in mind. I will investigate the problem as soon as I get back to work after vacation. Thanks for the the review. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>