On Tue, 2021-09-28 at 13:53 -0700, Luck, Tony wrote: > On Tue, Sep 28, 2021 at 11:11:30PM +0300, Jarkko Sakkinen wrote: > > On Tue, 2021-09-28 at 15:41 +0000, Luck, Tony wrote: > > > > > Add debugfs files /sys/kernel/debug/sgx/poison_page_list so that system > > > > > administrators get a list of those pages that have been dropped because > > > > > of poison. > > > > > > > > So, what would a sysadmin do with that detailed information? > > > > > > It's going to be a rare case that there are any poisoned pages on that list > > > (a large enough cluster will have some systems that have uncorrected > > > recoverable errors in SGX EPC memory). > > > > > > Even when there are some poisoned pages, there will only be a few. Systems > > > that have thousands of pages with uncorrected memory errors will surely crash > > > because one of those errors is going to either trigger an error marked as fatal, > > > or the error won’t be recoverable by Linux because it is in kernel memory. > > > > > > A sysadmin might add a script to run during system shutdown (or periodically > > > during run-time) to save the poison page list. Then at startup run: > > > > > > for addr in `cat saved_sgx_poison_page_list` > > > do > > > echo $addr > /sys/devices/system/memory/hard_offline_page > > > done > > > > > > to make poison persistent across reboots. > > > > > > -Tony > > > > Couldn't it be a blob with 8 bytes for each address? > > It could be a blob. But that would require some perl/python > instead of simple shell to do the above persistence trick. The way I've understood it, a list of values breaks sysfs conventions. There can be only single value per attribute. Even, if the blob is interpreted as a list of integers, it is still a value, as far as sysfs is concerned. I'd also consider programs written with C, or perhaps Rust, when we (ever) add any new sysfs for SGX. In my opinion, it makes sense to make any uapi things we add accesible to as many tools as we can. Such a trivially constructed blob is not enormously hard to parse in any language, but at least I don't enjoy parsing list of strings in C code, whereas loading a blob is effortless. This kind of shows why the current sysfs conventions make sense in the first place: they enforce to design attributes in the manner that they are as reachable as possible. That's why I would follow the conventions in a strict manner. Finally, I would make a proper sysfs attribute out of this (and a separate patch), which would be available per node. /Jarkko