On Mon, 25 Oct 2010, KOSAKI Motohiro wrote: > > The per zone approach seems to be at variance with how objects are tracked > > at the slab layer. There is no per zone accounting there. So attempts to > > do expiration of caches etc at that layer would not work right. > > Please define your 'right' behavior ;-) Right here meant not excessive shrink calls for a particular node. > If we need to discuss 'right' thing, we also need to define how behavior > is right, I think. slab API itself don't have zone taste. but it implictly > depend on a zone because buddy and reclaim are constructed on zones and > slab is constructed on buddy. IOW, every slab object have a home zone. True every page has a zone. However, per cpu caching and NUMA distances only work per node (or per cache sharing domain which may just be a fraction of a "node"). The slab allocators attempt to keep objects on queues that are cache hot. For that purpose only the node matters not the zone. > So, which workload or usecause make a your head pain? The head pain is because of the conflict of object tracking in the page allocator per zone and in the slabs per node. In general per zone object tracking in the page allocators percpu lists is not optimal since at variance with how the cpu caches actually work. - Cpu caches exist typically per node or per sharing domain (which is not reflected in the page allocators at all) - NUMA distance effects only change for per node allocations. The concept of a "zone" is for the benefit of certain legacy drivers that have limitations for the memory range on which they can performa DMA operations. With the IOMMUs and other modern technology this should no longer be an issue. An Mel used it to attach a side car (ZONE_MOVABLE) to the VM ... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>