On Sat, Jan 9, 2016 at 4:23 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > On Sat, Jan 9, 2016 at 2:33 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >> Shouldn't that logic live in the mcsafe_copy routine itself rather >> than being delegated to callers? >> > > Yes, please. Yes - we should have some of that fancy self-patching code that redirects to the optimal routine for the cpu model we are running on. BUT ... it's all going to be very messy. We don't have any CPUID capability bits to say whether we support recovery, or which instructions are good/bad choices for recovery. You might think that MCG_CAP{24} which is described as "software error recovery" (or some such) would be a good clue, but you'd be wrong. The bit got a little overloaded and there are cpus that set it, but won't recover. Only Intel(R) Xeon(R) branded cpus can recover, but not all. The story so far: Nehalem, Westmere: E7 models support SRAO recovery (patrol scrub, cache eviction). Not relevant for this e-mail thread. Sandy Bridge: Some "advanced RAS" skus will recover from poison reads (these have E5 model names, there was no E7 in this generation) Ivy Bridge: Xeon E5-* models do not recover. E7-* models do recover. Note E5 and E7 have the same CPUID model number. Haswell: Same as Ivy Bridge Broadwell/Sky Lake: Xeon not released yet ... can't talk about them. Linux code recently got some recovery bits for AMD cpus ... I don't know what the story is on which models support this, -Tony -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>