On Tue, 2021-09-28 at 15:41 +0000, Luck, Tony wrote: > > > Add debugfs files /sys/kernel/debug/sgx/poison_page_list so that system > > > administrators get a list of those pages that have been dropped because > > > of poison. > > > > So, what would a sysadmin do with that detailed information? > > It's going to be a rare case that there are any poisoned pages on that list > (a large enough cluster will have some systems that have uncorrected > recoverable errors in SGX EPC memory). > > Even when there are some poisoned pages, there will only be a few. Systems > that have thousands of pages with uncorrected memory errors will surely crash > because one of those errors is going to either trigger an error marked as fatal, > or the error won’t be recoverable by Linux because it is in kernel memory. > > A sysadmin might add a script to run during system shutdown (or periodically > during run-time) to save the poison page list. Then at startup run: > > for addr in `cat saved_sgx_poison_page_list` > do > echo $addr > /sys/devices/system/memory/hard_offline_page > done > > to make poison persistent across reboots. > > -Tony Couldn't it be a blob with 8 bytes for each address? /Jarkko