On 25/11/13 10:38, Johannes Weiner wrote: > The VM maintains cached filesystem pages on two types of lists. One > list holds the pages recently faulted into the cache, the other list > holds pages that have been referenced repeatedly on that first list. > The idea is to prefer reclaiming young pages over those that have > shown to benefit from caching in the past. We call the recently used > list "inactive list" and the frequently used list "active list". > > Currently, the VM aims for a 1:1 ratio between the lists, which is the > "perfect" trade-off between the ability to *protect* frequently used > pages and the ability to *detect* frequently used pages. This means > that working set changes bigger than half of cache memory go > undetected and thrash indefinitely, whereas working sets bigger than > half of cache memory are unprotected against used-once streams that > don't even need caching. > > Historically, every reclaim scan of the inactive list also took a > smaller number of pages from the tail of the active list and moved > them to the head of the inactive list. This model gave established > working sets more gracetime in the face of temporary use-once streams, > but ultimately was not significantly better than a FIFO policy and > still thrashed cache based on eviction speed, rather than actual > demand for cache. > > This patch solves one half of the problem by decoupling the ability to > detect working set changes from the inactive list size. By > maintaining a history of recently evicted file pages it can detect > frequently used pages with an arbitrarily small inactive list size, > and subsequently apply pressure on the active list based on actual > demand for cache, not just overall eviction speed. > > Every zone maintains a counter that tracks inactive list aging speed. > When a page is evicted, a snapshot of this counter is stored in the > now-empty page cache radix tree slot. On refault, the minimum access > distance of the page can be assesed, to evaluate whether the page > should be part of the active list or not. > > This fixes the VM's blindness towards working set changes in excess of > the inactive list. And it's the foundation to further improve the > protection ability and reduce the minimum inactive list size of 50%. > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> > --- <snip> > + * fault ------------------------+ > + * | > + * +--------------+ | +-------------+ > + * reclaim <- | inactive | <-+-- demotion | active | <--+ > + * +--------------+ +-------------+ | > + * | | > + * +-------------- promotion ------------------+ > + * > + * > + * Access frequency and refault distance > + * > + * A workload is trashing when its pages are frequently used but they "thrashing". ~Ryan -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html