On Tue, 11 Jun 2024, Jiaqi Yan wrote: > @@ -267,6 +268,20 @@ used:: > These are informational only. They do not mean that anything is wrong > with your system. To disable them, echo 4 (bit 2) into drop_caches. > > +enable_soft_offline > +=================== > +Control whether to soft offline memory pages that have (excessive) correctable > +memory errors. It is your call to choose between reliability (stay away from > +fragile physical memory) vs performance (brought by HugeTLB or transparent > +hugepages). > + Could you expand upon the relevance of HugeTLB or THP in this documentation? I understand the need in some cases to soft offline memory after a number of correctable memory errors, but it's not clear how the performance implications plays into this. The paragraph below goes into a difference in the splitting behavior, are hugepage users the only ones that should be concerned with this? > +When setting to 1, kernel attempts to soft offline the page when it thinks > +needed. For in-use page, page content will be migrated to a new page. If > +the oringinal hugepage is a HugeTLB hugepage, regardless of in-use or free, s/oringinal/original/ > +it will be dissolved into raw pages, and the capacity of the HugeTLB pool > +will reduce by 1. If the original hugepage is a transparent hugepage, it > +will be split into raw pages. When setting to 0, kernel won't attempt to > +soft offline the page. Its default value is 1. > This behavior is the same for all architectures?