> On Mon, Apr 10, 2017 at 2:28 PM, Kani, Toshimitsu <toshi.kani@xxxxxxx> > wrote: > >> > Thanks for the update. I think the alignment check should be based on > >> > the following note in copy_user_nocache. > >> > > >> > * Note: Cached memory copy is used when destination or size is not > >> > * naturally aligned. That is: > >> > * - Require 8-byte alignment when size is 8 bytes or larger. > >> > * - Require 4-byte alignment when size is 4 bytes. > >> > > >> > So, I think the code may be something like this. I also made the following > >> changes: > >> > >> Thanks! > >> > >> > - Mask with 7, not 8. > >> > >> Yes, good catch. > >> > >> > - ALIGN with cacheline size, instead of 8. > >> > - Add (bytes > flushed) test since calculation with unsigned long still > results > >> in a negative > >> > value (as a positive value). > >> > > >> > if (bytes < 8) { > >> > if ((dest & 3) || (bytes != 4)) > >> > arch_wb_cache_pmem(addr, 1); > >> > } else { > >> > if (dest & 7) { > >> > dest = ALIGN(dest, boot_cpu_data.x86_clflush_size); > >> > >> Why align the destination to the next cacheline? As far as I can see > >> the ALIGN_DESTINATION macro in arch/x86/include/asm/asm.h only aligns > >> to the next 8-byte boundary. > > > > The clflush here flushes for the cacheline size. So, we do not need to flush > > the same cacheline again when the unaligned tail is in the same line. > > Ok, makes sense. Last question, can't we reduce the check to be: > > if ((bytes > flushed) && ((bytes - flushed) & 3)) > > ...since if 'bytes' was 4-byte aligned we would have performed > non-temporal stores. That is not documented behavior of copy_user_nocache, but as long as the pmem version of copy_user_nocache follows the same implemented behavior, yes, that works. > Can I add your Signed-off-by: on v3? Sure. Thanks, -Toshi