Re: [x86 RAS question] what is the behavior that guest happen machine check exceptions(MCEs) on page table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On 23/11/2017 04:18, gengdongjiu wrote:
> > Hi ying/guangrong, sorry to disturb you, I have a question to consult
> > with you. For the x86 RAS, if guest OS happen machine check
> > exceptions(MCEs) in page table, such as guest application's page
> > table, what is the behavior for it? For normal data RAS error, host
> > will deliver SIGBUS to Qemu, and Qemu will inject MCE to guest. but if
> > the page table happen RAS error, host should not deliver SIGBUS,
> 
> The host will deliver SIGBUS even in this case; the host doesn't know what the page is used for.

Paolo,
   Thanks for the reply.

I am afraid it will not be delivered.

Host has two chance to deliver SIGBUS: one is in memory error handler(memory_failure());
Another is in KVM.

When hardware detects a memory error on a stage2 page table page
(f.e. memory scrubbing running in background), MCE SRAO is triggered, 
And guest trap to host, then the host kernel kicks memory error handler.
But memory error handler(memory_failure()) does nothing except that 
set this page table address to poisoned flag. because there's currently 
no way to isolate the page table page. the main problem should be that no
one easily knows "which processes owned the page table page."
So the error page is still open for access, so here it does not deliver SIGBUS.[1]

then later some CPU try to access the stage2 page table page, which
triggers severer MCE SRAR, guest trap to host again, KVM will judge
whether the error address has poisoned flag, if having, KVM will deliver
BUS_MCEERR_AR SIGBUS. I am afraid the error address that KVM use to
judge is the target address that want translation to,[2], not the page table 
itself address. So here the SIGBUS signal may be not delivered again.

[1] https://lkml.org/lkml/2017/10/30/773
[2] the pfn should be not page table address, when happen stage2 page table RAS error.

static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
{
	if (pfn == KVM_PFN_ERR_HWPOISON) {
		kvm_send_hwpoison_signal(kvm_vcpu_gfn_to_hva(vcpu, gfn), current);
		return 0;
	}

	return -EFAULT;
}

> 
> If the MCE happens on the *host* page tables, I think QEMU will be killed.
> 
> Paolo
> 
> > then guest will not know it happen error, for this case, what is the
> > behavior for KVM, terminate the VM? I do not find the handling logic.
> > Thanks! look forward your to reply.
> >
> >





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux