> -----Original Message----- > From: linux-kernel-owner@xxxxxxxxxxxxxxx [mailto:linux-kernel- > owner@xxxxxxxxxxxxxxx] On Behalf Of Daniel J Blueman > Sent: Thursday, April 30, 2015 11:10 AM > Subject: Re: [PATCH 0/13] Parallel struct page initialisation v4 ... > On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're > seeing stock 4.0 boot in 7136s. This drops to 2159s, or a 70% reduction > with this patchset. Non-temporal PMD init [1] drops this to 1045s. > > Nathan, what do you guys see with the non-temporal PMD patch [1]? Do > add a sfence at the ende label if you manually patch. > ... > [1] https://lkml.org/lkml/2015/4/23/350 >From that post: > +loop_64: > + decq %rcx > + movnti %rax,(%rdi) > + movnti %rax,8(%rdi) > + movnti %rax,16(%rdi) > + movnti %rax,24(%rdi) > + movnti %rax,32(%rdi) > + movnti %rax,40(%rdi) > + movnti %rax,48(%rdi) > + movnti %rax,56(%rdi) > + leaq 64(%rdi),%rdi > + jnz loop_64 There are some even more efficient instructions available in x86, depending on the CPU features: * movnti 8 byte * movntdq %xmm 16 byte, SSE * vmovntdq %ymm 32 byte, AVX * vmovntdq %zmm 64 byte, AVX-512 (forthcoming) The last will transfer a full cache line at a time. For NVDIMMs, the nd pmem driver is also in need of memcpy functions that use these non-temporal instructions, both for performance and reliability. We also need to speed up __clear_page and copy_user_enhanced_string so userspace accesses through the page cache can keep up. https://lkml.org/lkml/2015/4/2/453 is one of the threads on that topic. Some results I've gotten there under different cache attributes (in terms of 4 KiB IOPS): 16-byte movntdq: UC write iops=697872 (697.872 K)(0.697872 M) WB write iops=9745800 (9745.8 K)(9.7458 M) WC write iops=9801800 (9801.8 K)(9.8018 M) WT write iops=9812400 (9812.4 K)(9.8124 M) 32-byte vmovntdq: UC write iops=1274400 (1274.4 K)(1.2744 M) WB write iops=10259000 (10259 K)(10.259 M) WC write iops=10286000 (10286 K)(10.286 M) WT write iops=10294000 (10294 K)(10.294 M) --- Robert Elliott, HP Server Storage ��.n������g����a����&ޖ)���)��h���&������梷�����Ǟ�m������)������^�����������v���O��zf������