Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes: > On Tue, Oct 29, 2024 at 12:36:11PM +0100, Peter Zijlstra wrote: > static __always_inline void *__inline_memcpy(void *to, const void *from, size_t len) > { > void *ret = to; > > - asm volatile("rep movsb" > - : "+D" (to), "+S" (from), "+c" (len) > - : : "memory"); > - return ret; > + asm volatile("1:\n\t" > + ALT_64("rep movsb", > + "call rep_movs_alternative", ALT_NOT(X86_FEATURE_FSRM)) I don't know if it matters, but this basically brings in a whole memcpy to a text_poke situation, which should only be a handful of bytes, and creates a new stack frame in the !FSRM case, which the __always_inline was intending to avoid. But given what text_poke is, maybe micro optimizations don't really matter. And fewer memcpy() implementations seems like a good idea. Thanks, -- Alex