On Tue, May 19, 2020 at 03:12:42PM -0700, Dan Williams wrote: > The original copy_mc_fragile() implementation had negative performance > implications since it did not use the fast-string instruction sequence > to perform copies. For this reason copy_mc_to_kernel() fell back to > plain memcpy() to preserve performance on platform that did not indicate > the capability to recover from machine check exceptions. However, that > capability detection was not architectural and now that some platforms > can recover from fast-string consumption of memory errors the memcpy() > fallback now causes these more capable platforms to fail. > > Introduce copy_mc_generic() as the fast default implementation of > copy_mc_to_kernel() and finalize the transition of copy_mc_fragile() to > be a platform quirk to indicate 'fragility'. With this in place > copy_mc_to_kernel() is fast and recovery-ready by default regardless of > hardware capability. > > Thanks to Vivek for identifying that copy_user_generic() is not suitable > as the copy_mc_to_user() backend since the #MC handler explicitly checks > ex_has_fault_handler(). /me is curious to know why #MC handler mandates use of _ASM_EXTABLE_FAULT(). [..] > +/* > + * copy_mc_generic - memory copy with exception handling > + * > + * Fast string copy + fault / exception handling. If the CPU does > + * support machine check exception recovery, but does not support > + * recovering from fast-string exceptions then this CPU needs to be > + * added to the copy_mc_fragile_key set of quirks. Otherwise, absent any > + * machine check recovery support this version should be no slower than > + * standard memcpy. > + */ > +SYM_FUNC_START(copy_mc_generic) > + ALTERNATIVE "jmp copy_mc_fragile", "", X86_FEATURE_ERMS > + movq %rdi, %rax > + movq %rdx, %rcx > +.L_copy: > + rep movsb > + /* Copy successful. Return zero */ > + xorl %eax, %eax > + ret > +SYM_FUNC_END(copy_mc_generic) > +EXPORT_SYMBOL_GPL(copy_mc_generic) > + > + .section .fixup, "ax" > +.E_copy: > + /* > + * On fault %rcx is updated such that the copy instruction could > + * optionally be restarted at the fault position, i.e. it > + * contains 'bytes remaining'. A non-zero return indicates error > + * to copy_safe() users, or indicate short transfers to copy_safe() is vestige of terminology of previous patches? > + * user-copy routines. > + */ > + movq %rcx, %rax > + ret > + > + .previous > + > + _ASM_EXTABLE_FAULT(.L_copy, .E_copy) A question for my education purposes. So copy_mc_generic() can handle MCE both on source and destination addresses? (Assuming some device can generate MCE on stores too). On the other hand copy_mc_fragile() handles MCE recovery only on source and non-MCE recovery on destination. Thanks Vivek