repost the feedback by under Johannes's comment When something like a higher-order allocation drops a large number of file pages, it's *intentional* that the pages that were evicted before them become less valuable and less likely to be activated on refault. There is a finite amount of in-memory LRU space and the pages that have been evicted the most recently have precedence because they have the highest proven access frequency. [HZY]: Yes. I do agree with you about the original thought of sacrificing long distance access pages when huge memory demands arise. The problem is what is the criteria of selecting the page, which you can find from what I comment in the patch, that is, some pages have long refault_distance while having a very short access time in between. Of course, when a large amount of the cache that was pushed out in between is not re-used again, and don't claim their space in memory, it would be great if we could then activate the older pages that *are* re-used again in their stead.But that would require us being able to look into the future. When an old page refaults, we don't know if a younger page is still going to refault with a shorter refault distance or not. If it won't, then we were right to activate it. If it will refault, then we put something on the active list whose reuse frequency is too low to be able to fit into memory, and we thrash the hottest pages in the system. [HZY]: We do NOT use the absolute timestamp when page refaulting to indicate young or old of the page and thus to decide the position of LRU. The criteria which i use is to comparing the "time duration of the page's out of cache" and "the active files shrinking time by dividing average refault ratio". I inherite the concept of deeming ACTIVE file as deficit of INACTIVE files, but use time to avoid the scenario as suggested in patch's [1]. As Matthew says, you are fairly randomly making refault activations more aggressive (especially with that timestamp unpacking bug), and while that expectedly boosts workload transition / startup, it comes at the cost of disrupting stable states because you can flood a very active in-ram workingset with completely cold cache pages simply because they refault uniformly wrt each other. [HZY]: I analysis the log got from trace_printk, what we activate have proven record of long refault distance but very short refault time. On Wed, Apr 17, 2019 at 7:46 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > On Wed 17-04-19 19:36:21, Zhaoyang Huang wrote: > > sorry for the confusion. What I mean is the basic idea doesn't change > > as replacing the refault criteria from refault_distance to timestamp. > > But the detailed implementation changed a lot, including fix bugs, > > update the way of packing the timestamp, 32bit/64bit differentiation > > etc. So it makes sense for starting a new context. > > Not really. My take away from the previous discussion is that Johannes > has questioned the timestamping approach itself. I wasn't following very > closely so I might be wrong here but if that is really the case then it > doesn't make much sense to improve the implementation if there is no > consensus on the approach itself. > > -- > Michal Hocko > SUSE Labs