V1->V2: Add patch 3 to fix an error when compiling code for 32-bit architectures without CONFIG_SMP enabled. This patchset follows the Linus suggestion to make the i_size_read/write helpers be smp_load_acquire/store_release(), after which the extra smp_rmb in filemap_read() is no longer needed, so it is removed. And remove the extra type checking in smp_load_acquire/smp_store_release under the !CONFIG_SMP case to avoid compilation errors. Functional tests were performed and no new problems were found. Here are the results of unixbench tests based on 6.7.0-next-20240118 on arm64, with some degradation in single-threading and some optimization in multi-threading, but overall the impact is not significant. ### 72 CPUs in system; running 1 parallel copy of tests System Benchmarks Index Values | base | patched | cmp | --------------------------------------|---------|---------|--------| Dhrystone 2 using register variables | 3635.06 | 3596.3 | -1.07% | Double-Precision Whetstone | 808.58 | 808.58 | 0.00% | Execl Throughput | 623.52 | 618.1 | -0.87% | File Copy 1024 bufsize 2000 maxblocks | 1715.82 | 1668.58 | -2.75% | File Copy 256 bufsize 500 maxblocks | 1320.98 | 1250.16 | -5.36% | File Copy 4096 bufsize 8000 maxblocks | 2639.36 | 2488.48 | -5.72% | Pipe Throughput | 869.06 | 872.3 | 0.37% | Pipe-based Context Switching | 106.26 | 117.22 | 10.31% | Process Creation | 247.72 | 246.74 | -0.40% | Shell Scripts (1 concurrent) | 1234.98 | 1226 | -0.73% | Shell Scripts (8 concurrent) | 6893.96 | 6210.46 | -9.91% | System Call Overhead | 493.72 | 494.28 | 0.11% | --------------------------------------|---------|---------|--------| Total | 1003.92 | 989.58 | -1.43% | ### 72 CPUs in system; running 72 parallel copy of tests System Benchmarks Index Values | base | patched | cmp | --------------------------------------|-----------|-----------|--------| Dhrystone 2 using register variables | 260471.88 | 258065.04 | -0.92% | Double-Precision Whetstone | 58212.32 | 58219.3 | 0.01% | Execl Throughput | 6954.7 | 7444.08 | 7.04% | File Copy 1024 bufsize 2000 maxblocks | 64244.74 | 64618.24 | 0.58% | File Copy 256 bufsize 500 maxblocks | 89933.8 | 87026.38 | -3.23% | File Copy 4096 bufsize 8000 maxblocks | 79808.14 | 81916.42 | 2.64% | Pipe Throughput | 62174.38 | 62389.74 | 0.35% | Pipe-based Context Switching | 27239.28 | 27887.24 | 2.38% | Process Creation | 3551.28 | 3800.54 | 7.02% | Shell Scripts (1 concurrent) | 19212.26 | 20749.34 | 8.00% | Shell Scripts (8 concurrent) | 20842.02 | 21958.12 | 5.36% | System Call Overhead | 35328.24 | 35451.68 | 0.35% | --------------------------------------|-----------|-----------|--------| Total | 35592.42 | 36450.36 | 2.41% | Baokun Li (3): fs: make the i_size_read/write helpers be smp_load_acquire/store_release() Revert "mm/filemap: avoid buffered read/write race to read inconsistent data" asm-generic: remove extra type checking in acquire/release for non-SMP case include/asm-generic/barrier.h | 2 -- include/linux/fs.h | 10 ++++++++-- mm/filemap.c | 9 --------- 3 files changed, 8 insertions(+), 13 deletions(-) -- 2.31.1