(Cc'ing f2fs and crypto as I've noticed something similar with f2fs a while ago, which may mean that this is not specific to EROFS: https://lore.kernel.org/all/CAD14+f2nBZtLfLC6CwNjgCOuRRRjwzttp3D3iK4Of+1EEjK+cw@xxxxxxxxxxxxxx/ ) Hi. I'm encountering a very weird EROFS data corruption. I noticed when I build an EROFS image for AOSP development, the device would randomly not boot from a certain build. After inspecting the log, I noticed that a file got corrupted. After adding a hash check during the build flow, I noticed that EROFS would randomly read data wrong. I now have a reliable method of reproducing the issue, but here's the funny/weird part: it's only happening on my laptop (i7-1185G7). This is not happening with my 128 cores buildfarm machine (Threadripper 3990X). I first suspected a hardware issue, but: a. The laptop had its motherboard replaced recently (due to a failing physical Type-C port). b. The laptop passes memory test (memtest86). c. This happens on all kernel versions from v5.4 to the latest v6.6 including my personal custom builds and Canonical's official Ubuntu kernels. d. This happens on different host SSDs and file-system combinations. e. This only happens on LZ4. LZ4HC doesn't trigger the issue. f. This only happens when mounting the image natively by the kernel. Using fuse with erofsfuse is fine. This is how I'm reproducing the issue: # mkfs.erofs -zlz4 -T0 --ignore-mtime tmp.img /mnt/lib64/ mkfs.erofs 1.7 Build completed. ------ Filesystem UUID: 3a7e1f90-5450-40f9-92a2-945bacdb51c3 Filesystem total blocks: 53075 (of 4096-byte blocks) Filesystem total inodes: 973 Filesystem total metadata blocks: 73 Filesystem total deduplicated bytes (of source files): 0 # mount tmp.img /mnt # for i in {1..30}; do echo 3 > /proc/sys/vm/drop_caches; find /mnt -type f -exec xxh64sum {} + | sort -k2 | xxh64sum -; done 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 293a8e7de2a53019 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 293a8e7de2a53019 stdin 293a8e7de2a53019 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin 0b40f1abfbb6e9a8 stdin As you can see, I sometimes get 0b40f1abfbb6e9a8 and 293a8e7de2a53019 in others. This is when I manually inspect the failing file: # echo 3 > /proc/sys/vm/drop_caches; xxh64sum /mnt/vendor.qti.hardware.mwqemadapter@xxxxxx dc96f35f015a0e5d /mnt/vendor.qti.hardware.mwqemadapter@xxxxxx # xxd < /mnt/vendor.qti.hardware.mwqemadapter@xxxxxx > /tmp/1 [ several more attempts until I get a different hash... ] # echo 3 > /proc/sys/vm/drop_caches; xxh64sum /mnt/vendor.qti.hardware.mwqemadapter@xxxxxx 1cfe5d69c28fff6c /mnt/vendor.qti.hardware.mwqemadapter@xxxxxx # xxd < /mnt/vendor.qti.hardware.mwqemadapter@xxxxxx > /tmp/2 # diff /tmp/[12] 3741c3741 < 0000e9c0: f40e 0000 b46b 0000 ac5c 0000 140e 0000 .....k...\...... --- > 0000e9c0: 445a 0000 e40d 0000 ac5c 0000 140e 0000 DZ.......\...... This could still very well be my hardware issue, but I highly suspect something's wrong with the kernel software code that happens to only trigger on my hardware configuration. I've uploaded the generated image here: https://arter97.com/.erofs/ but I'm not sure it'll be reproducible on other machines. I've also tried updating the LZ4 module in the /lib/lz4 to the latest v1.9.4 and the latest dev trunk (4032c8c787e6). I've managed to get it working with the Linux kernel, but the corruption still happens. Let me know if there's anything I can help to narrow down the culprit. Thanks,