On Tue, Dec 15, 2015 at 10:27 AM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > On Tue, Dec 15, 2015 at 9:53 AM, Luck, Tony <tony.luck@xxxxxxxxx> wrote: >>>> ... and the non-temporal version is the optimal one even though we're >>>> defaulting to copy_user_enhanced_fast_string for memcpy on modern Intel >>>> CPUs...? >> >> My current generation cpu has a bit of an issue with recovering from a >> machine check in a "rep mov" ... so I'm working with a version of memcpy >> that unrolls into individual mov instructions for now. >> >>> At least the pmem driver use case does not want caching of the >>> source-buffer since that is the raw "disk" media. I.e. in >>> pmem_do_bvec() we'd use this to implement memcpy_from_pmem(). >>> However, caching the destination-buffer may prove beneficial since >>> that data is likely to be consumed immediately by the thread that >>> submitted the i/o. >> >> I can drop the "nti" from the destination moves. Does "nti" work >> on the load from source address side to avoid cache allocation? > > My mistake, I don't think we have an uncached load capability, only store. Correction we have MOVNTDQA, but that requires saving the fpu state and marking the memory as WC, i.e. probably not worth it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>