On Fri, Jul 14, 2017 at 2:27 AM, Mel Gorman <mgorman@xxxxxxx> wrote: > On Fri, Jul 14, 2017 at 07:02:57PM +1000, Benjamin Herrenschmidt wrote: >> On Fri, 2017-07-14 at 09:31 +0100, Mel Gorman wrote: >> > It may also be only a gain on a limited number of architectures depending >> > on exactly how an architecture handles flushing. At the time, batching >> > this for x86 in the worse-case scenario where all pages being reclaimed >> > were mapped from multiple threads knocked 24.4% off elapsed run time and >> > 29% off system CPU but only on multi-socket NUMA machines. On UMA, it was >> > barely noticable. For some workloads where only a few pages are mapped or >> > the mapped pages on the LRU are relatively sparese, it'll make no difference. >> > >> > The worst-case situation is extremely IPI intensive on x86 where many >> > IPIs were being sent for each unmap. It's only worth even considering if >> > you see that the time spent sending IPIs for flushes is a large portion >> > of reclaim. >> >> Ok, it would be interesting to see how that compares to powerpc with >> its HW tlb invalidation broadcasts. We tend to hate them and prefer >> IPIs in most cases but maybe not *this* case .. (mostly we find that >> IPI + local inval is better for large scale invals, such as full mm on >> exit/fork etc...). >> >> In the meantime I found the original commits, we'll dig and see if it's >> useful for us. >> > > I would suggest that it is based on top of Andy's work that is currently in > Linus' tree for 4.13-rc1 as the core/arch boundary is a lot clearer. While > there is other work pending on top related to mm and generation counters, > that is primarily important for addressing the race which ppc64 may not > need if you always flush to clear the accessed bit (or equivalent). The > main thing to watch for is that if an accessed or young bit is being set > for the first time that the arch check the underlying PTE and trap if it's > invalid. If that holds and there is a flush when the young bit is cleared > then you probably do not need the arch hook that closes the race. > Ben, if you could read the API in tip:x86/mm + Mel's patch, it would be fantastic. I'd like to know whether a non-x86 non-mm person can understand the API (arch_tlbbatch_add_mm, arch_tlbbatch_flush, and arch_tlbbatch_flush_one_mm) well enough to implement it. I'd also like to know for real that it makes sense outside of x86. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>