On Fri, 2017-07-14 at 09:31 +0100, Mel Gorman wrote: > It may also be only a gain on a limited number of architectures depending > on exactly how an architecture handles flushing. At the time, batching > this for x86 in the worse-case scenario where all pages being reclaimed > were mapped from multiple threads knocked 24.4% off elapsed run time and > 29% off system CPU but only on multi-socket NUMA machines. On UMA, it was > barely noticable. For some workloads where only a few pages are mapped or > the mapped pages on the LRU are relatively sparese, it'll make no difference. > > The worst-case situation is extremely IPI intensive on x86 where many > IPIs were being sent for each unmap. It's only worth even considering if > you see that the time spent sending IPIs for flushes is a large portion > of reclaim. Ok, it would be interesting to see how that compares to powerpc with its HW tlb invalidation broadcasts. We tend to hate them and prefer IPIs in most cases but maybe not *this* case .. (mostly we find that IPI + local inval is better for large scale invals, such as full mm on exit/fork etc...). In the meantime I found the original commits, we'll dig and see if it's useful for us. Cheers, Ben. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>