On 24/11/2017 01:52, gengdongjiu wrote: >> On 23/11/2017 04:18, gengdongjiu wrote: >>> Hi ying/guangrong, sorry to disturb you, I have a question to consult >>> with you. For the x86 RAS, if guest OS happen machine check >>> exceptions(MCEs) in page table, such as guest application's page >>> table, what is the behavior for it? For normal data RAS error, host >>> will deliver SIGBUS to Qemu, and Qemu will inject MCE to guest. but if >>> the page table happen RAS error, host should not deliver SIGBUS, >> >> The host will deliver SIGBUS even in this case; the host doesn't know what the page is used for. > > Paolo, > Thanks for the reply. > > I am afraid it will not be delivered. > > Host has two chance to deliver SIGBUS: one is in memory error handler(memory_failure()); > Another is in KVM. > > When hardware detects a memory error on a stage2 page table page The stage2 page tables are not guest application page table, those are hypervisor page tables. In that case, indeed the host would kill the guest. Paolo > (f.e. memory scrubbing running in background), MCE SRAO is triggered, > And guest trap to host, then the host kernel kicks memory error handler. > But memory error handler(memory_failure()) does nothing except that > set this page table address to poisoned flag. because there's currently > no way to isolate the page table page. the main problem should be that no > one easily knows "which processes owned the page table page." > So the error page is still open for access, so here it does not deliver SIGBUS.[1] > > then later some CPU try to access the stage2 page table page, which > triggers severer MCE SRAR, guest trap to host again, KVM will judge > whether the error address has poisoned flag, if having, KVM will deliver > BUS_MCEERR_AR SIGBUS. I am afraid the error address that KVM use to > judge is the target address that want translation to,[2], not the page table > itself address. So here the SIGBUS signal may be not delivered again. > > [1] https://lkml.org/lkml/2017/10/30/773 > [2] the pfn should be not page table address, when happen stage2 page table RAS error. > > static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn) > { > if (pfn == KVM_PFN_ERR_HWPOISON) { > kvm_send_hwpoison_signal(kvm_vcpu_gfn_to_hva(vcpu, gfn), current); > return 0; > } > > return -EFAULT; > } > >> >> If the MCE happens on the *host* page tables, I think QEMU will be killed. >> >> Paolo >> >>> then guest will not know it happen error, for this case, what is the >>> behavior for KVM, terminate the VM? I do not find the handling logic. >>> Thanks! look forward your to reply. >>> >>> >