Re: [x86 RAS question] what is the behavior that guest happen machine check exceptions(MCEs) on page table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 24/11/2017 01:52, gengdongjiu wrote:
>> On 23/11/2017 04:18, gengdongjiu wrote:
>>> Hi ying/guangrong, sorry to disturb you, I have a question to consult
>>> with you. For the x86 RAS, if guest OS happen machine check
>>> exceptions(MCEs) in page table, such as guest application's page
>>> table, what is the behavior for it? For normal data RAS error, host
>>> will deliver SIGBUS to Qemu, and Qemu will inject MCE to guest. but if
>>> the page table happen RAS error, host should not deliver SIGBUS,
>>
>> The host will deliver SIGBUS even in this case; the host doesn't know what the page is used for.
> 
> Paolo,
>    Thanks for the reply.
> 
> I am afraid it will not be delivered.
> 
> Host has two chance to deliver SIGBUS: one is in memory error handler(memory_failure());
> Another is in KVM.
> 
> When hardware detects a memory error on a stage2 page table page

The stage2 page tables are not guest application page table, those are
hypervisor page tables.  In that case, indeed the host would kill the guest.

Paolo

> (f.e. memory scrubbing running in background), MCE SRAO is triggered, 
> And guest trap to host, then the host kernel kicks memory error handler.
> But memory error handler(memory_failure()) does nothing except that 
> set this page table address to poisoned flag. because there's currently 
> no way to isolate the page table page. the main problem should be that no
> one easily knows "which processes owned the page table page."
> So the error page is still open for access, so here it does not deliver SIGBUS.[1]
> 
> then later some CPU try to access the stage2 page table page, which
> triggers severer MCE SRAR, guest trap to host again, KVM will judge
> whether the error address has poisoned flag, if having, KVM will deliver
> BUS_MCEERR_AR SIGBUS. I am afraid the error address that KVM use to
> judge is the target address that want translation to,[2], not the page table 
> itself address. So here the SIGBUS signal may be not delivered again.
> 
> [1] https://lkml.org/lkml/2017/10/30/773
> [2] the pfn should be not page table address, when happen stage2 page table RAS error.
> 
> static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
> {
> 	if (pfn == KVM_PFN_ERR_HWPOISON) {
> 		kvm_send_hwpoison_signal(kvm_vcpu_gfn_to_hva(vcpu, gfn), current);
> 		return 0;
> 	}
> 
> 	return -EFAULT;
> }
> 
>>
>> If the MCE happens on the *host* page tables, I think QEMU will be killed.
>>
>> Paolo
>>
>>> then guest will not know it happen error, for this case, what is the
>>> behavior for KVM, terminate the VM? I do not find the handling logic.
>>> Thanks! look forward your to reply.
>>>
>>>
> 




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux