Re: [PATCH v9 0/8] stg mail -e --version=v9 \

Alexander Duyck <alexander.duyck@xxxxxxxxx> · Wed, 11 Sep 2019 08:12:03 -0700

On Wed, Sep 11, 2019 at 4:36 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Tue 10-09-19 14:23:40, Alexander Duyck wrote:
> [...]
> > We don't put any limitations on the allocator other then that it needs to
> > clean up the metadata on allocation, and that it cannot allocate a page
> > that is in the process of being reported since we pulled it from the
> > free_list. If the page is a "Reported" page then it decrements the
> > reported_pages count for the free_area and makes sure the page doesn't
> > exist in the "Boundary" array pointer value, if it does it moves the
> > "Boundary" since it is pulling the page.
>
> This is still a non-trivial limitation on the page allocation from an
> external code IMHO. I cannot give any explicit reason why an ordering on
> the free list might matter (well except for page shuffling which uses it
> to make physical memory pattern allocation more random) but the
> architecture seems hacky and dubious to be honest. It shoulds like the
> whole interface has been developed around a very particular and single
> purpose optimization.

How is this any different then the code that moves a page that will
likely be merged to the tail though?

In our case the "Reported" page is likely going to be much more
expensive to allocate and use then a standard page because it will be
faulted back in. In such a case wouldn't it make sense for us to want
to keep the pages that don't require faults ahead of those pages in
the free_list so that they are more likely to be allocated? All we are
doing with the boundary list is preventing still resident pages from
being deferred behind pages that would require a page fault to get
access to.

> I remember that there was an attempt to report free memory that provided
> a callback mechanism [1], which was much less intrusive to the internals
> of the allocator yet it should provide a similar functionality. Did you
> see that approach? How does this compares to it? Or am I completely off
> when comparing them?
>
> [1] mostly likely not the latest version of the patchset
> http://lkml.kernel.org/r/1502940416-42944-5-git-send-email-wei.w.wang@xxxxxxxxx

There have been a few comparisons between this patch set and the ones
from Wei Wang. In regards to the one you are pointing to the main
difference is that I am not permanently locking memory. Basically what
happens is that the iterator will take the lock, pull a few pages,
release the lock while reporting them, and then take the lock to
return those pages, grab some more, and repeat.

I was actually influenced somewhat by the suggestions that patchset
received, specifically I believe it resembles something like what was
suggested by Linus in response to v35 of that patch set:
https://lore.kernel.org/linux-mm/CA+55aFzqj8wxXnHAdUTiOomipgFONVbqKMjL_tfk7e5ar1FziQ@xxxxxxxxxxxxxx/

Basically where the feature Wei Wang was working on differs from this
patch set is that I need this to run continually, his only needed to
run periodically as he was just trying to identify free pages at a
fixed point in time. My goal is to identify pages that have been freed
since the last time I reported them. To do that I need a flag in the
page to identify those pages, and an iterator in the form of a
boundary pointer so that I can incrementally walk through the list
without losing track of freed pages.