Re: [RFC PATCH] mm/workingset : judge file page activity via timestamp

Zhaoyang Huang <huangzhaoyang@xxxxxxxxx> · Wed, 17 Apr 2019 20:26:22 +0800

repost the feedback by under Johannes's comment
When something like a higher-order allocation drops a large number of
file pages, it's *intentional* that the pages that were evicted before
them become less valuable and less likely to be activated on refault.
There is a finite amount of in-memory LRU space and the pages that
have been evicted the most recently have precedence because they have
the highest proven access frequency.
[HZY]: Yes. I do agree with you about the original thought of
sacrificing long distance access pages when huge memory demands arise.
The problem is what is the criteria of selecting the page, which you
can find from what I comment in the patch, that is, some pages have
long refault_distance while having a very short access time in
between.

Of course, when a large amount of the cache that was pushed out in
between is not re-used again, and don't claim their space in memory,
it would be great if we could then activate the older pages that *are*
re-used again in their stead.But that would require us being able to
look into the future. When an old page refaults, we don't know if a
younger page is still going to refault with a shorter refault distance
or not. If it won't, then we were right to activate it. If it will
refault, then we put something on the active list whose reuse
frequency is too low to be able to fit into memory, and we thrash the
hottest pages in the system.
[HZY]: We do NOT use the absolute timestamp when page refaulting to
indicate young or old of the page and thus to decide the position of
LRU. The criteria which i use is to comparing the "time duration of
the page's out of cache" and "the active files shrinking time by
dividing average refault ratio". I inherite the concept of deeming
ACTIVE file as deficit of INACTIVE files, but use time to avoid the
scenario as suggested in patch's [1].

As Matthew says, you are fairly randomly making refault activations
more aggressive (especially with that timestamp unpacking bug), and
while that expectedly boosts workload transition / startup, it comes
at the cost of disrupting stable states because you can flood a very
active in-ram workingset with completely cold cache pages simply
because they refault uniformly wrt each other.
[HZY]: I analysis the log got from trace_printk, what we activate have
proven record of long refault distance but very short refault time.

On Wed, Apr 17, 2019 at 7:46 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Wed 17-04-19 19:36:21, Zhaoyang Huang wrote:
> > sorry for the confusion. What I mean is the basic idea doesn't change
> > as replacing the refault criteria from refault_distance to timestamp.
> > But the detailed implementation changed a lot, including fix bugs,
> > update the way of packing the timestamp, 32bit/64bit differentiation
> > etc. So it makes sense for starting a new context.
>
> Not really. My take away from the previous discussion is that Johannes
> has questioned the timestamping approach itself. I wasn't following very
> closely so I might be wrong here but if that is really the case then it
> doesn't make much sense to improve the implementation if there is no
> consensus on the approach itself.
>
> --
> Michal Hocko
> SUSE Labs