On 12/20/2024 3:27 PM, Dave Hansen wrote: > On 12/20/24 13:36, Sohil Mehta wrote: >> X86_FEATURE_REP_GOOD is only set for family 6 processors. Extend the >> check to family numbers beyond 15. > > Could you explain why, please? > To answer this I was trying to understand where Fast string (X86_FEATURE_REP_GOOD) is used. It looks like copy_page() in lib/copy_page_64.S is the only place it really matters. clear_page() in include/asm/page_64.h would likely use Enhanced fast strings (ERMS) if available. Would it be correct to say that copy_page() and potentially clear_page() would be slower on Family 18/19 CPUs without the fix? >> It is uncertain whether the Pentium 4s (family 15) should set the >> feature flag as well. Commit 185f3b9da24c ("x86: make intel.c have >> 64-bit support code") that originally set X86_FEATURE_REP_GOOD also set >> the x86_cache_alignment preference for family 15 processors. The >> omission of the family 15 seems intentional. >> Analyzing more, it seems the below check in early_init_intel() is not really effective. /* * If fast string is not enabled in IA32_MISC_ENABLE for any reason, * clear the fast string and enhanced fast string CPU capabilities. */ if (c->x86_vfm >= INTEL_PENTIUM_M_DOTHAN) { rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable); if (!(misc_enable & MSR_IA32_MISC_ENABLE_FAST_STRING)) { pr_info("Disabled fast string operations\n"); setup_clear_cpu_cap(X86_FEATURE_REP_GOOD); setup_clear_cpu_cap(X86_FEATURE_ERMS); } } It gets overridden later in intel_init() with the below code: if (c->x86 == 6) set_cpu_cap(c, X86_FEATURE_REP_GOOD); Shouldn't the order of this be the other way around? >> Also, the 32-bit user copy alignment preference is only set for family 6 >> and 15 processors. Extend the preference to family numbers beyond 15. > > Can you please provide some more context so it's clear which hunk this > refers to? Alternatively, can you break this out into a separate patch? > This is referring to the below chunk. Separating it seems like a better idea to avoid ambiguity. > - > #ifdef CONFIG_X86_INTEL_USERCOPY > /* > * Set up the preferred alignment for movsl bulk memory moves > + * Family 4 - 486: untested > + * Family 5 - Pentium: untested > + * Family 6 - PII/PIII only like movsl with 8-byte alignment > + * Family 15 - P4 is OK down to 8-byte alignment > */ > - switch (c->x86) { > - case 4: /* 486: untested */ > - break; > - case 5: /* Old Pentia: untested */ > - break; > - case 6: /* PII/PIII only like movsl with 8-byte alignment */ > - movsl_mask.mask = 7; > - break; > - case 15: /* P4 is OK down to 8-byte alignment */ > + if (c->x86_vfm >= INTEL_PENTIUM_PRO) > movsl_mask.mask = 7; > - break; > - } > #endif