On 03/04/2015 03:03 PM, Shaohua Li wrote: > kswapd is a per-node based. Sometimes there is imbalance between nodes, > node A is full of clean file pages (easy to reclaim), node B is > full of anon pages (hard to reclaim). With memory pressure, kswapd will > be waken up for both nodes. The kswapd of node B will try to swap, while > we prefer reclaim pages from node A first. The real issue here is we > don't have a mechanism to prevent memory allocation from a hard-reclaim > node (node B here) if there is an easy-reclaim node (node A) to reclaim > memory. > > The swap can happen even with swapiness 0. Below is a simple script to > trigger it. cpu 1 and 8 are in different node, each has 72G memory: > truncate -s 70G img > taskset -c 8 dd if=img of=/dev/null bs=4k > taskset -c 1 usemem 70G > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 5e8eadd..31b03e6 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1990,7 +1990,7 @@ static void get_scan_count(struct lruvec *lruvec, int swappiness, > * thrashing file LRU becomes infinitely more attractive than > * anon pages. Try to detect this based on file LRU size. > */ > - if (global_reclaim(sc)) { > + if (global_reclaim(sc) && sc->priority < DEF_PRIORITY - 2) { > unsigned long zonefile; > unsigned long zonefree; What kernel does this apply to? Current upstream does not seem to have the "sc->priority < DEF_PRIORITY - 2" check, unless I somehow managed to mess up "git clone" on several systems. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>