Hi Kirill, "Kirill A. Shutemov" <kirill@xxxxxxxxxxxxx> writes: > On Wed, Feb 14, 2018 at 06:05:01PM -0500, Jan Stancek wrote: >> Hi, >> >> mallocstress[1] LTP testcase takes ~5+ minutes to complete >> on some arm64 systems (e.g. 4 node, 64 CPU, 256GB RAM): >> real 7m58.089s >> user 0m0.513s >> sys 24m27.041s >> >> But if I turn off THP ("transparent_hugepage=never") it's a lot faster: >> real 0m4.185s >> user 0m0.298s >> sys 0m13.954s >> > > It's multi-threaded workload. My *guess* is that poor performance is due > to lack of ARCH_ENABLE_SPLIT_PMD_PTLOCK support on arm64. In this instance I think the latency is due to the large size of PMD hugepages and THP=always. But split PMD locks seem like a useful feature to have for large core count systems. I'll have a go at enabling this for arm64. Thanks, Punit -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>