On Wed, Oct 27, 2010 at 12:05 PM, Rik van Riel <riel@xxxxxxxxxx> wrote: > On 10/27/2010 01:21 PM, Ying Han wrote: >> >> kswapd's use case of hardware PTE accessed bit is to approximate page LRU. >> The >> ActiveLRU demotion to InactiveLRU are not base on accessed bit, while it >> is only >> used to promote when a page is on inactive LRU list. All of the state >> transitions >> are triggered by memory pressure and thus has weak relationship with >> respect to >> time. In addition, hardware already transparently flush tlb whenever CPU >> context >> switch processes and given limited hardware TLB resource, the time period >> in >> which a page is accessed but not yet propagated to struct page is very >> small >> in practice. With the nature of approximation, kernel really don't need to >> flush TLB >> for changing PTE's access bit. This commit removes the flush operation >> from it. >> >> Signed-off-by: Ying Han<yinghan@xxxxxxxxxx> >> Singed-off-by: Ken Chen<kenchen@xxxxxxxxxx> > > The reasoning behind the patch makes sense. > > However, have you measured any improvements in run time with > this patch? The VM is already tweaked to minimize the number > of pages that get aged, so it would be interesting to know > where you saw issues. Firstly, not all CPUs do flush the TLB on VM switch, and secondly, it would be theoretically possible to spin and never be able to flush free pages even if none are ever being touched. It doesn't have to be an absurdly tiny machine, either. You could cover a good few megs with TLBs (and a small embedded system could easily have less than that of mapped memory on its LRU). I agree the theory is fine because if the CPU thinks it is worth to keep a TLB entry around, then it probably knows better than our stupid LRU :) And TLB flushing can get nasty when we start swapping a lot with threaded apps. However, to handle corner cases it should either: flush all TLBs once per *something* [eg. every scan priority level above N, or every N pages scanned, etc] start doing the flush versions of the ptep manipulation when memory pressure is getting high. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href