> >> > The clflush here flushes for the cacheline size. So, we do not need to > flush > >> > the same cacheline again when the unaligned tail is in the same line. > >> > >> Ok, makes sense. Last question, can't we reduce the check to be: > >> > >> if ((bytes > flushed) && ((bytes - flushed) & 3)) > >> > >> ...since if 'bytes' was 4-byte aligned we would have performed > >> non-temporal stores. > > > > That is not documented behavior of copy_user_nocache, but as long as the > pmem > > version of copy_user_nocache follows the same implemented behavior, yes, > that > > works. > > Hmm, sorry this comment confuses me, I'm only referring to the current > version of __copy_user_nocache not the new pmem version. The way I > read the current code we only ever jump to the cached copy loop > (.L_1b_cache_copy_loop) if the trailing byte-count is 4-byte > misaligned. Yes, you are right and that's how the code is implemented. I added this trailing 4-byte handling for the >=8B case, which is shared with <8B case, since it was easy to do. But I considered it a bonus. This function also needs to handle 4B-aligned destination if it is to state that it handles 4B alignment for the >=8B case as well. Otherwise, it's inconsistent. Since I did not see much point of supporting such case, I simply documented in the Note that 8 byte alignment is required for the >=8B case. Thanks, -Toshi