On Saturday, 12 November 2022 15:32:47 PST Luck, Tony wrote: > > Because if this is going to be run during downtime, as Thiago says, then > > you can just as well use debugfs for this. And then there's no need to > > cast any API in stone and so on. > > Did Thiago say “during downtime”? I think > he talked about some users opportunistic > use of scan tests. But that’s far from only > during downtime. We fully expect CSPs to > run these scans periodically on production > machines. Let me clarify. I did not mean full system downtime for maintenance, but I did mean that there's a gap in consumer workload, for both threads of one or more cores. As Tony said, it should have little observable effect on any other core, meaning an IFS run can be scheduled *as* any other workload (albeit a privileged one) for a subset of the machine, while the rest of the system remains in production. This allows them a lot of flexibility and is the reason I am talking about containers, with the implied constraint that the container's view of the filesystem is narrower than the kernel's. There'll be some coordination required to get all cores to have run all tests, but it should be doable over a period of time, and I'm thinking days, not years. This should still be short enough to reveal if the system can detect a defect or wear-out before any real workload is impacted by it. If an issue is detected, the admin can decide whether to offline the core(s) reporting problems but keep the rest serving workloads and generating revenue, or offline the entire machine for full maintenance and to run more invasive and time-consuming tests. -- Thiago Macieira - thiago.macieira (AT) intel.com Cloud Software Architect - Intel DCAI Cloud Engineering