On Sat, 2 Mar 2024 at 01:37, Tong Tiangen <tongtiangen@xxxxxxxxxx> wrote: > > I think this solution has two impacts: > 1. Although it is not a performance-critical path, the CPU usage may be > affected by one more memory copy in some large-memory applications. Compared to the IO, the extra memory copy is a non-issue. If anything, getting rid of the "copy_mc" flag removes extra code in a much more important path (ie the normal iov_iter code). > 2. If a hardware memory error occurs in "good location" and the > ".copy_mc" is removed, the kernel will panic. That's always true. We do not support non-recoverable machine checks on kernel memory. Never have, and realistically probably never will. In fact, as far as I know, the hardware that caused all this code in the first place no longer exists, and never really made it to wide production. The machine checks in question happened on pmem, now killed by Intel. It's possible that somebody wants to use it for something else, but let's hope any future implementations are less broken than the unbelievable sh*tshow that caused all this code in the first place. The whole copy_mc_to_kernel() mess exists mainly due to broken pmem devices along with old and broken CPU's that did not deal correctly with machine checks inside the regular memory copy ('rep movs') code, and caused hung machines. IOW, notice how 'copy_mc_to_kernel()' just becomes a regular 'memcpy()' on fixed hardware, and how we have that disgusting copy_mc_fragile_key that gets enabled for older CPU cores. And yes, we then have copy_mc_enhanced_fast_string() which isn't *that* disgusting, and that actually handles machine checks properly on more modern hardware, but it's still very much "the hardware is misdesiged, it has no testing, and nobody sane should depend on this" In other words, it's the usual "Enterprise Hardware" situation. Looks fancy on paper, costs an arm and a leg, and the reality is just sad, sad, sad. Linus