RE: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Subject: [PATCH v2] x86, pmem: fix broken __copy_user_nocache cache-
> bypass assumptions
> 
> Before we rework the "pmem api" to stop abusing __copy_user_nocache()
> for memcpy_to_pmem() we need to fix cases where we may strand dirty data
> in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
> for arbitrary data transfers from userspace. There is no guarantee that
> these transfers, performed by dax_iomap_actor(), will have aligned
> destinations or aligned transfer lengths. Backstop the usage
> __copy_user_nocache() with explicit cache management in these unaligned
> cases.
> 
> Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
> that is saved for a later patch that moves the entirety of the "pmem
> api" into the pmem driver directly.
 :
> ---
> v2: Change the condition for flushing the last cacheline of the
>     destination from 8-byte to 4-byte misalignment (Toshi)
 :
>  arch/x86/include/asm/pmem.h |   41 ++++++++++++++++++++++++++++++----
 :
> @@ -94,7 +86,34 @@ static inline size_t arch_copy_from_iter_pmem(void
> *addr, size_t bytes,
>  	/* TODO: skip the write-back by always using non-temporal stores */
>  	len = copy_from_iter_nocache(addr, bytes, i);
> 
> -	if (__iter_needs_pmem_wb(i))
> +	/*
> +	 * In the iovec case on x86_64 copy_from_iter_nocache() uses
> +	 * non-temporal stores for the bulk of the transfer, but we need
> +	 * to manually flush if the transfer is unaligned. In the
> +	 * non-iovec case the entire destination needs to be flushed.
> +	 */
> +	if (iter_is_iovec(i)) {
> +		unsigned long dest = (unsigned long) addr;
> +
> +		/*
> +		 * If the destination is not 8-byte aligned then
> +		 * __copy_user_nocache (on x86_64) uses cached copies
> +		 */
> +		if (dest & 8) {
> +			arch_wb_cache_pmem(addr, 1);
> +			dest = ALIGN(dest, 8);
> +		}
> +
> +		/*
> +		 * If the remaining transfer length, after accounting
> +		 * for destination alignment, is not 4-byte aligned
> +		 * then __copy_user_nocache() falls back to cached
> +		 * copies for the trailing bytes in the final cacheline
> +		 * of the transfer.
> +		 */
> +		if ((bytes - (dest - (unsigned long) addr)) & 4)
> +			arch_wb_cache_pmem(addr + bytes - 1, 1);
> +	} else
>  		arch_wb_cache_pmem(addr, bytes);
> 
>  	return len;

Thanks for the update.  I think the alignment check should be based on
the following note in copy_user_nocache.

 * Note: Cached memory copy is used when destination or size is not
 * naturally aligned. That is:
 *  - Require 8-byte alignment when size is 8 bytes or larger.
 *  - Require 4-byte alignment when size is 4 bytes.

So, I think the code may be something like this.  I also made the following changes:
 - Mask with 7, not 8.
 - ALIGN with cacheline size, instead of 8.
 - Add (bytes > flushed) test since calculation with unsigned long still results in a negative
   value (as a positive value).

        if (bytes < 8) {
                if ((dest & 3) || (bytes != 4))
                        arch_wb_cache_pmem(addr, 1);
        } else {
                if (dest & 7) {
                        dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
                        arch_wb_cache_pmem(addr, 1);
                }

                flushed = dest - (unsigned long) addr;
                if ((bytes > flushed) && ((bytes - flushed) & 7))
                        arch_wb_cache_pmem(addr + bytes - 1, 1);
        }

Thanks,
-Toshi




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]