Hi Gao, On Tue, Dec 5, 2023 at 4:32 PM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote: > > Hi Juhyung, > > On 2023/12/4 11:41, Juhyung Park wrote: > > ... > > > >> > >> - Could you share the full message about the output of `lscpu`? > > > > Sure: > > > > Architecture: x86_64 > > CPU op-mode(s): 32-bit, 64-bit > > Address sizes: 39 bits physical, 48 bits virtual > > Byte Order: Little Endian > > CPU(s): 8 > > On-line CPU(s) list: 0-7 > > Vendor ID: GenuineIntel > > BIOS Vendor ID: Intel(R) Corporation > > Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz > > BIOS Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz None CPU > > @ 3.0GHz > > BIOS CPU family: 198 > > CPU family: 6 > > Model: 140 > > Thread(s) per core: 2 > > Core(s) per socket: 4 > > Socket(s): 1 > > Stepping: 1 > > CPU(s) scaling MHz: 60% > > CPU max MHz: 4800.0000 > > CPU min MHz: 400.0000 > > BogoMIPS: 5990.40 > > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc > > a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss > > ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art > > arch_perfmon pebs bts rep_good nopl xtopology nonstop_ > > tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes6 > > 4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xt > > pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_dead > > line_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowp > > refetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb st > > ibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ > > ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid > > rdt_a avx512f avx512dq rdseed adx smap avx512ifma clfl > > ushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl > > xsaveopt xsavec xgetbv1 xsaves split_lock_detect dtherm > > ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp > > hwp_pkg_req vnmi avx512vbmi umip pku ospke avx512_vbmi > > 2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme av > > x512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2i > > Sigh, I've been thinking. Here FSRM is the most significant difference between > our environments, could you only try the following diff to see if there's any > difference anymore? (without the previous disable patch.) > > diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S > index 1b60ae81ecd8..1b52a913233c 100644 > --- a/arch/x86/lib/memmove_64.S > +++ b/arch/x86/lib/memmove_64.S > @@ -41,9 +41,7 @@ SYM_FUNC_START(__memmove) > #define CHECK_LEN cmp $0x20, %rdx; jb 1f > #define MEMMOVE_BYTES movq %rdx, %rcx; rep movsb; RET > .Lmemmove_begin_forward: > - ALTERNATIVE_2 __stringify(CHECK_LEN), \ > - __stringify(CHECK_LEN; MEMMOVE_BYTES), X86_FEATURE_ERMS, \ > - __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM > + CHECK_LEN > > /* > * movsq instruction have many startup latency Yup, that also seems to fix it. Are we looking at a potential memmove issue? > > Thanks, > Gao Xiang