I didn't have time to read through newer versions of this patch series but I remember there were concerns about this functionality being pulled into the page allocator previously both by me and Mel [1][2]. Have those been addressed? I do not see an ack from Mel or any other MM people. Is there really a consensus that we want something like that living in the allocator? There has also been a different approach discussed and from [3] (referenced by the cover letter) I can only see : Then Nitesh's solution had changed to the bitmap approach[7]. However it : has been pointed out that this solution doesn't deal with sparse memory, : hotplug, and various other issues. which looks more like something to be done than a fundamental roadblocks. [1] http://lkml.kernel.org/r/20190912163525.GV2739@xxxxxxxxxxxxxxxxxxx [2] http://lkml.kernel.org/r/20190912091925.GM4023@xxxxxxxxxxxxxx [3] http://lkml.kernel.org/r/29f43d5796feed0dec8e8bb98b187d9dac03b900.camel@xxxxxxxxxxxxxxx On Tue 05-11-19 16:05:47, Andrew Morton wrote: > From: Alexander Duyck <alexander.h.duyck@xxxxxxxxxxxxxxx> > Subject: mm: introduce Reported pages > > In order to pave the way for free page reporting in virtualized > environments we will need a way to get pages out of the free lists and > identify those pages after they have been returned. To accomplish this, > this patch adds the concept of a Reported Buddy, which is essentially > meant to just be the Uptodate flag used in conjunction with the Buddy page > type. > > It adds a set of pointers we shall call "reported_boundary" which > represent the upper boundary between the unreported and reported pages. > The general idea is that in order for a page to cross from one side of the > boundary to the other it will need to verify that it went through the > reporting process. Ultimately a free list has been fully processed when > the boundary has been moved from the tail all they way up to occupying the > first entry in the list. Without this we would have to manually walk the > entire page list until we have find a page that hasn't been reported. In > my testing this adds as much as 18% additional overhead which would make > this unattractive as a solution. > > One limitation to this approach is that it is essentially a linear search > and in the case of the free lists we can have pages added to either the > head or the tail of the list. In order to place limits on this we only > allow pages to be added before the reported_boundary instead of adding to > the tail itself. An added advantage to this approach is that we should be > reducing the overall memory footprint of the guest as it will be more > likely to recycle warm pages versus trying to allocate the reported pages > that were likely evicted from the guest memory. > > Since we will only be reporting one zone at a time we keep the boundary > limited to being defined for just the zone we are currently reporting > pages from. Doing this we can keep the number of additional pointers > needed quite small. To flag that the boundaries are in place we use a > single bit in the zone to indicate that reporting and the boundaries are > active. > > We store the index of the boundary pointer used to track the reported page > in the page->index value. Doing this we can avoid unnecessary computation > to determine the index value again. There should be no issues with this > as the value is unused when the page is in the buddy allocator, and is > reset as soon as the page is removed from the free list. > > Link: http://lkml.kernel.org/r/20191105220219.15144.69031.stgit@localhost.localdomain > Signed-off-by: Alexander Duyck <alexander.h.duyck@xxxxxxxxxxxxxxx> > Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> > Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > Cc: Dave Hansen <dave.hansen@xxxxxxxxx> > Cc: David Hildenbrand <david@xxxxxxxxxx> > Cc: <konrad.wilk@xxxxxxxxxx> > Cc: Luiz Capitulino <lcapitulino@xxxxxxxxxx> > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> > Cc: Michael S. Tsirkin <mst@xxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxxxx> > Cc: Oscar Salvador <osalvador@xxxxxxx> > Cc: Pankaj Gupta <pagupta@xxxxxxxxxx> > Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > Cc: Rik van Riel <riel@xxxxxxxxxxx> > Cc: Vlastimil Babka <vbabka@xxxxxxx> > Cc: Wei Wang <wei.w.wang@xxxxxxxxx> > Cc: Yang Zhang <yang.zhang.wz@xxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> -- Michal Hocko SUSE Labs