On 9/15/2021 1:27 PM, Dan Williams wrote:
I'm also thinking about the MOVEDIR64B instruction and how it
might be used to clear poison on the fly with a single 'store'.
Of course, that means we need to figure out how to narrow down the
error blast radius first.
It turns out the MOVDIR64B error clearing idea runs into problem with
the device poison tracking. Without the explicit notification that
software wanted the error cleared the device may ghost report errors
that are not there anymore. I think we should continue explicit error
clearing and notification of the device that the error has been
cleared (by asking the device to clear it).
Sorry for the late response, I was out for several days.
Your concern is understood. I wasn't thinking of an out-of-band
MOVDIR64B to clear poison, I was thinking about adding a case to
pmem_clear_poison(), such that if CPUID feature shows that
MOVDIR64B is supported, instead of calling the BIOS interface
to clear poison, MOVDIR64B could be called. The advantage is
a. a lot faster; b. smaller radius. And the driver has a chance
to update its ->bb record.
thanks,
-jane