On Tue, Dec 15, 2015 at 05:53:31PM +0000, Luck, Tony wrote: > My current generation cpu has a bit of an issue with recovering from a > machine check in a "rep mov" ... so I'm working with a version of memcpy > that unrolls into individual mov instructions for now. Ah. > I can drop the "nti" from the destination moves. Does "nti" work > on the load from source address side to avoid cache allocation? I don't think so: +1: movq (%rsi),%r8 +2: movq 1*8(%rsi),%r9 +3: movq 2*8(%rsi),%r10 +4: movq 3*8(%rsi),%r11 ... You need to load the data into registers first because MOVNTI needs them there as it does reg -> mem movement. That first load from memory into registers with a normal MOV will pull the data into the cache. Perhaps the first thing to try would be to see what slowdown normal MOVs bring and if not really noticeable, use those instead. > On another topic raised by Boris ... is there some CONFIG_PMEM* > that I should use as a dependency to enable all this? I found CONFIG_LIBNVDIMM only today: drivers/nvdimm/Kconfig:1:menuconfig LIBNVDIMM drivers/nvdimm/Kconfig:2: tristate "NVDIMM (Non-Volatile Memory Device) Support" -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>