Re: Weird EROFS data corruption

Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> · Tue, 5 Dec 2023 22:34:37 +0800

On 2023/12/5 22:23, Juhyung Park wrote:
Hi Gao,

On Tue, Dec 5, 2023 at 4:32 PM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote:

Hi Juhyung,

On 2023/12/4 11:41, Juhyung Park wrote:

...


- Could you share the full message about the output of `lscpu`?

Sure:

Architecture:            x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Address sizes:         39 bits physical, 48 bits virtual
    Byte Order:            Little Endian
CPU(s):                  8
    On-line CPU(s) list:   0-7
Vendor ID:               GenuineIntel
    BIOS Vendor ID:        Intel(R) Corporation
    Model name:            11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
      BIOS Model name:     11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz None CPU
                            @ 3.0GHz
      BIOS CPU family:     198
      CPU family:          6
      Model:               140
      Thread(s) per core:  2
      Core(s) per socket:  4
      Socket(s):           1
      Stepping:            1
      CPU(s) scaling MHz:  60%
      CPU max MHz:         4800.0000
      CPU min MHz:         400.0000
      BogoMIPS:            5990.40
      Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                           a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
                           ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
                            arch_perfmon pebs bts rep_good nopl xtopology nonstop_
                           tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes6
                           4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xt
                           pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_dead
                           line_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowp
                           refetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb st
                           ibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_
                           ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
                            rdt_a avx512f avx512dq rdseed adx smap avx512ifma clfl
                           ushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl
                           xsaveopt xsavec xgetbv1 xsaves split_lock_detect dtherm
                            ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
                            hwp_pkg_req vnmi avx512vbmi umip pku ospke avx512_vbmi
                           2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme av
                           x512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2i

Sigh, I've been thinking.  Here FSRM is the most significant difference between
our environments, could you only try the following diff to see if there's any
difference anymore? (without the previous disable patch.)

diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 1b60ae81ecd8..1b52a913233c 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -41,9 +41,7 @@ SYM_FUNC_START(__memmove)
   #define CHECK_LEN     cmp $0x20, %rdx; jb 1f
   #define MEMMOVE_BYTES movq %rdx, %rcx; rep movsb; RET
   .Lmemmove_begin_forward:
-       ALTERNATIVE_2 __stringify(CHECK_LEN), \
-                     __stringify(CHECK_LEN; MEMMOVE_BYTES), X86_FEATURE_ERMS, \
-                     __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM
+       CHECK_LEN

         /*
          * movsq instruction have many startup latency

Yup, that also seems to fix it.
Are we looking at a potential memmove issue?

I'm still analyzing this behavior as well as the root cause and
I will also try to get a recent cloud server with FSRM myself
to find more clues.

Thanks,
Gao Xiang