On Fri, 1 Apr 2011, KOSAKI Motohiro wrote: > > On Thu, 31 Mar 2011, KOSAKI Motohiro wrote: > > > > > 1) zone reclaim doesn't work if the system has multiple node and the > > > workload is file cache oriented (eg file server, web server, mail server, et al). > > > because zone recliam make some much free pages than zone->pages_min and > > > then new page cache request consume nearest node memory and then it > > > bring next zone reclaim. Then, memory utilization is reduced and > > > unnecessary LRU discard is increased dramatically. > > > > That is only true if the webserver only allocates from a single node. If > > the allocation load is balanced then it will be fine. It is useful to > > reclaim pages from the node where we allocate memory since that keeps the > > dataset node local. > > Why? > Scheduler load balancing only consider cpu load. Then, usually memory > pressure is no complete symmetric. That's the reason why we got the > bug report periodically. The scheduler load balancing also considers caching effects. It does not consider NUMA effects aside from heuritics though. If processes are randomly moving around then zone reclaim is not effective. Processes need to stay mainly on a certain node and memory needs to be allocatable from that node in order to improve performance. zone_reclaim is useless if you toss processes around the box. > btw, when we are talking about memory distance aware reclaim, we have to > recognize traditional numa (ie external node interconnect) and on-chip > numa have different performance characteristics. on-chip remote node access > is not so slow, then elaborated nearest node allocation effort doesn't have > so much worth. especially, a workload use a lot of short lived object. > Current zone-reclaim don't have so much issue when using traditiona numa > because it's fit your original design and assumption and administrators of > such systems have good skill and don't hesitate to learn esoteric knobs. > But recent on-chip and cheap numa are used for much different people against > past. therefore new issues and claims were raised. You can switch NUMA off completely at the bios level. Then the distances are not considered by the OS. If they are not relevant then lets just switch NUMA off. Managing NUMA distances can cause significant overhead. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html