On Mon, Apr 10, 2017 at 11:53 AM, Kani, Toshimitsu <toshi.kani@xxxxxxx> wrote: >> Subject: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache- >> bypass assumptions >> >> Before we rework the "pmem api" to stop abusing __copy_user_nocache() >> for memcpy_to_pmem() we need to fix cases where we may strand dirty data >> in the cpu cache. The problem occurs when copy_from_iter_pmem() is used >> for arbitrary data transfers from userspace. There is no guarantee that >> these transfers, performed by dax_iomap_actor(), will have aligned >> destinations or aligned transfer lengths. Backstop the usage >> __copy_user_nocache() with explicit cache management in these unaligned >> cases. >> >> Yes, copy_from_iter_pmem() is now too big for an inline, but addressing >> that is saved for a later patch that moves the entirety of the "pmem >> api" into the pmem driver directly. > : >> --- >> v2: Change the condition for flushing the last cacheline of the >> destination from 8-byte to 4-byte misalignment (Toshi) > : >> arch/x86/include/asm/pmem.h | 41 ++++++++++++++++++++++++++++++---- > : >> @@ -94,7 +86,34 @@ static inline size_t arch_copy_from_iter_pmem(void >> *addr, size_t bytes, >> /* TODO: skip the write-back by always using non-temporal stores */ >> len = copy_from_iter_nocache(addr, bytes, i); >> >> -if (__iter_needs_pmem_wb(i)) >> +/* >> + * In the iovec case on x86_64 copy_from_iter_nocache() uses >> + * non-temporal stores for the bulk of the transfer, but we need >> + * to manually flush if the transfer is unaligned. In the >> + * non-iovec case the entire destination needs to be flushed. >> + */ >> +if (iter_is_iovec(i)) { >> +unsigned long dest = (unsigned long) addr; >> + >> +/* >> + * If the destination is not 8-byte aligned then >> + * __copy_user_nocache (on x86_64) uses cached copies >> + */ >> +if (dest & 8) { >> +arch_wb_cache_pmem(addr, 1); >> +dest = ALIGN(dest, 8); >> +} >> + >> +/* >> + * If the remaining transfer length, after accounting >> + * for destination alignment, is not 4-byte aligned >> + * then __copy_user_nocache() falls back to cached >> + * copies for the trailing bytes in the final cacheline >> + * of the transfer. >> + */ >> +if ((bytes - (dest - (unsigned long) addr)) & 4) >> +arch_wb_cache_pmem(addr + bytes - 1, 1); >> +} else >> arch_wb_cache_pmem(addr, bytes); >> >> return len; > > Thanks for the update. I think the alignment check should be based on > the following note in copy_user_nocache. > > * Note: Cached memory copy is used when destination or size is not > * naturally aligned. That is: > * - Require 8-byte alignment when size is 8 bytes or larger. > * - Require 4-byte alignment when size is 4 bytes. > > So, I think the code may be something like this. I also made the following changes: Thanks! > - Mask with 7, not 8. Yes, good catch. > - ALIGN with cacheline size, instead of 8. > - Add (bytes > flushed) test since calculation with unsigned long still results in a negative > value (as a positive value). > > if (bytes < 8) { > if ((dest & 3) || (bytes != 4)) > arch_wb_cache_pmem(addr, 1); > } else { > if (dest & 7) { > dest = ALIGN(dest, boot_cpu_data.x86_clflush_size); Why align the destination to the next cacheline? As far as I can see the ALIGN_DESTINATION macro in arch/x86/include/asm/asm.h only aligns to the next 8-byte boundary.