On 2011-02-10 01:27, Huang Ying wrote: > On Wed, 2011-02-09 at 16:00 +0800, Jan Kiszka wrote: >> On 2011-02-09 04:00, Huang Ying wrote: >>> In Linux kernel HWPoison processing implementation, the virtual >>> address in processes mapping the error physical memory page is marked >>> as HWPoison. So that, the further accessing to the virtual >>> address will kill corresponding processes with SIGBUS. >>> >>> If the error physical memory page is used by a KVM guest, the SIGBUS >>> will be sent to QEMU, and QEMU will simulate a MCE to report that >>> memory error to the guest OS. If the guest OS can not recover from >>> the error (for example, the page is accessed by kernel code), guest OS >>> will reboot the system. But because the underlying host virtual >>> address backing the guest physical memory is still poisoned, if the >>> guest system accesses the corresponding guest physical memory even >>> after rebooting, the SIGBUS will still be sent to QEMU and MCE will be >>> simulated. That is, guest system can not recover via rebooting. >> >> Yeah, saw this already during my test... >> >>> >>> In fact, across rebooting, the contents of guest physical memory page >>> need not to be kept. We can allocate a new host physical page to >>> back the corresponding guest physical address. >> >> I just wondering what would be architecturally suboptimal if we simply >> remapped on SIGBUS directly. Would save us at least the bookkeeping. > > Because we can not change the content of memory silently during guest OS > running, this may corrupts guest OS data structure and even ruins disk > contents. But during rebooting, all guest OS state are discarded. I was not talking about remapping more than just the pages that became inaccessible, just like you do now. But I guess the problem is rather that insane guests continuing to access those pages before reboot should also still receive MCEs. Jan
Attachment:
signature.asc
Description: OpenPGP digital signature