No matter how the MFR policy is designed, userspace will eventually be notified of poisoned memory, via SIGBUS BUS_MCEERR_AR or BUS_MCEERR_AO. So far I personally prefer the global MFR policy but open to feedbacks to both options, or new ideas. [1] https://lwn.net/Articles/978732 [2] https://cloud.google.com/compute/docs/memory-optimized-machines#m2_machine_types [3] https://cloud.google.com/compute/sla [4] https://lore.kernel.org/lkml/20230428004139.2899856-1-jiaqiyan@xxxxxxxxxx [5] https://lwn.net/Articles/912017 [6] https://lore.kernel.org/linux-mm/20240828234958.GE3773488@xxxxxxxxxx/T/#m34d054d967a72ad8a7c8120c19447b415fd12179 [7] The example for MMIO bar is Nvidia's GB 200. In passthrough mode it supports VM access to nearly half of its 196GB HBM per card [7.1]. The example for kernel unmanaged host primary memory is Nvidia's extended GPU memory (EGM) [7.2], so that ~400GB LPDDR5 DIMMs per socket on the host can not only back VM memory, but are also accessible by GPU at high speed. Both HBM and EGM are exposed to VM via VM_PFNMAP under the hood, and MFR for both HBM and EGM are important because ML workload requires long VM uptime. [7.1] https://www.nvidia.com/en-us/data-center/gb200-nvl72/?ncid=pa-srch-goog-739865&_bt=709953060161&_bk=nvidia%20blackwell%20tensor%20core%20gpus&_bm=p&_bn=g&_bg=169122792888&gad_source=1&gclid=Cj0KCQjwz7C2BhDkARIsAA_SZKbHWgnjAA_0Ve8niwtx9FooW-bgzehdRkDnoke-zIKafDaVu9d75eEaAjc_EALw_wcB [7.2] https://developer.nvidia.com/blog/nvidia-grace-hopper-superchip-architecture-in-depth/#extended_gpu_memory [8] https://lore.kernel.org/lkml/20231123003513.24292-2-ankita@xxxxxxxxxx/#t [9] https://lore.kernel.org/linux-mm/20240828234958.GE3773488@xxxxxxxxxx/T/#m413a61acaf1fc60e65ee7968ab0ae3093f7b1ea3 [10] https://docs.google.com/drawings/d/1Dmx2sxUGyRWdA1-5-HVko6IpsFL6PYAYL0ZL8T8AhY4 [11] https://docs.google.com/drawings/d/1E4m5Zy6_JFLmsacM3Z8FU6LLxLiTPMxvbmf4gzZhN6c [12] https://docs.google.com/drawings/d/1hEe2BuEEJAlnqE4cjiZc-eBLjrkUk4BwOPDyL7TDClw [13] https://docs.google.com/drawings/d/1u4er__Bziwn7itijOwghXhfu-JrXMDhnfFVu62BTzr0 [14] https://lore.kernel.org/all/20240524215306.2705454-2-jane.chu@xxxxxxxxxx/T/#mbd530effd89d50eef7e9dd9375b900e7e34803c1 Jiaqi Yan (2): mm/memory-failure: introduce global MFR policy docs: mm: add enable_hard_offline sysctl Documentation/admin-guide/sysctl/vm.rst | 92 +++++++++++++++++++++++++ mm/memory-failure.c | 33 +++++++++ 2 files changed, 125 insertions(+) -- 2.46.0.792.g87dc391469-goog