On 28/06/2017 13:53, Christian Borntraeger wrote: > This is probably more a question for Martin or Heiko, but our initial machine > check handling seems to be older than the hwpoison infrastructure (older than > the 2.6 git history). Historically we have killed processes with SIGSEGV on > fatal errors and never included the BUS_MCEERR things so we still kill processes > on errors. From an architectural point of view we can get the failing address, > so maybe we should consider the BUS_MCEERR things for memory errors. Also because other architectures use SIGBUS, and QEMU uses SIGBUS too. > On the other hand since z196 (2010) the memory is protected with RAIM (in addition > to ECC) and I am not aware of any field incidence where HW was not able to recover > since the introduction of RAIM the pressure to do that is pretty small. > > If we decide to do that, this would require additional changes for KVM - we would then > need to translate the host address into a guest address or as V1 unset the valid > bit for the failing address information. That's fine. Does s390 also do background scrubbing of memory? That would result in action-optional SIGBUS (siginfo->si_code == BUS_MCEERR_AO) to programs that request them with prctl; QEMU does, in qemu_init_sigbus. These forward the scrubbing results to the guest and avoid a later action-required MCE. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-s390" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html