Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Huang-san,

(2010/10/08 12:15), Huang Ying wrote:
> Hi, Seto,
> 
> On Thu, 2010-10-07 at 11:41 +0800, Hidetoshi Seto wrote:
>> (2010/10/07 3:10), Dean Nelson wrote:
>>> On 10/06/2010 11:05 AM, Marcelo Tosatti wrote:
>>>> On Wed, Oct 06, 2010 at 10:58:36AM +0900, Hidetoshi Seto wrote:
>>>>> I got some more question:
>>>>>
>>>>> (2010/10/05 3:54), Marcelo Tosatti wrote:
>>>>>> Index: qemu/target-i386/cpu.h
>>>>>> ===================================================================
>>>>>> --- qemu.orig/target-i386/cpu.h
>>>>>> +++ qemu/target-i386/cpu.h
>>>>>> @@ -250,16 +250,32 @@
>>>>>>   #define PG_ERROR_RSVD_MASK 0x08
>>>>>>   #define PG_ERROR_I_D_MASK  0x10
>>>>>>
>>>>>> -#define MCG_CTL_P    (1UL<<8)   /* MCG_CAP register available */
>>>>>> +#define MCG_CTL_P    (1ULL<<8)   /* MCG_CAP register available */
>>>>>> +#define MCG_SER_P    (1ULL<<24) /* MCA recovery/new status bits */
>>>>>>
>>>>>> -#define MCE_CAP_DEF    MCG_CTL_P
>>>>>> +#define MCE_CAP_DEF    (MCG_CTL_P|MCG_SER_P)
>>>>>>   #define MCE_BANKS_DEF    10
>>>>>>
>>>>>
>>>>> It seems that current kvm doesn't support SER_P, so injecting SRAO
>>>>> to guest will mean that guest receives VAL|UC|!PCC and RIPV event
>>>>> from virtual processor that doesn't have SER_P.
>>>>
>>>> Dean also noted this. I don't think it was deliberate choice to not
>>>> expose SER_P. Huang?
>>>
>>> In my testing, I found that MCG_SER_P was not being set (and I was
>>> running on a Nehalem-EX system). Injecting a MCE resulted in the
>>> guest entering into panic() from mce_panic(). If crash_kexec()
>>> finds a kexec_crash_image the system ends up rebooting, otherwise,
>>> what happens next requires operator intervention.
>>
>> Good to know.
>> What I'm concerning is that if memory scrubbing SRAO event is
>> injected when !SER_P, linux guest with certain mce tolerant level
>> might grade it as "UC" severity and continue running with none of
>> panicking, killing and poisoning because of !PCC and RIPV.
>>
>> Could you provide the panic message of the guest in your test?
>> I think it can tell me why the mce handler decided to go panic.
> 
> That is a bug that the SER_P is not in KVM_MCE_CAP_SUPPORTED in kernel.
> I will fix it as soon as possible. And SRAO MCE should not be sent
> when !SER_P, we should add that condition in qemu-kvm.

That makes sense.
I think it is qemu's responsibility for what follows the AO-SIGBUS,
what action should be taken depends on the KVM's capability.

>>> When I applied a patch to the guest's kernel which forces mce_ser to be
>>> set, as if MCG_SER_P was set (see __mcheck_cpu_cap_init()), I found
>>> that when the memory page was 'owned' by a guest process, the process
>>> would be killed (if the page was dirty), and the guest would stay
>>> running. The HWPoisoned page would be sidelined and not cause any more
>>> issues.
>>
>> Excellent.
>> So while guest kernel knows which page is poisoned, guest processes
>> are controlled not to touch the page.
>>
>> ... Therefore rebooting the vm and renewing kernel will lost the
>> information where is poisoned.
> 
> Yes. That is an issue. Dean suggests that make qemu-kvm to refuse reboot
> the guest if there is poisoned page and ask for user to intervention. I
> have another idea to replace the poison pages with good pages when
> reboot, that is, recover without user intervention.

Sounds good.

I think it may be worth something to reserve pages for the replacement
before reboot is requested; at least we really don't want to fail
rebooting with 'no memory'.

>>>>> I think most OSes don't expect that it can receives MCE with !PCC
>>>>> on traditional x86 processor without SER_P.
>>>>>
>>>>> Q1: Is it safe to expect that guests can handle such !PCC event?
>>>
>>> This might be best answered by Huang, but as I mentioned above, without
>>> MCG_SER_P being set, the result was an orderly system panic on the
>>> guest.
>>
>> Though I'll wait Huang (I think he is on holiday), I believe that
>> system panic is just a possible option for AO (Action Optional)
>> event, no matter how the SER_P is.
> 
> We should fix this as I said above.
> 
>>>>> Q2: What is the expected behavior on the guest?
>>>
>>> I think I answered this above.
>>
>> Yeah, thanks.
>>
>>>
>>>>> Q3: What happen if guest reboots itself in response to the MCE?
>>>
>>> That depends...
>>>
>>> And the following issue also holds for a guest that is rebooted at
>>> some point having successfully sidelined the bad page.
>>>
>>> After the guest has panic'd, a system_reset of the guest or a restart
>>> initiated by crash_kexec() (called by panic() on the guest), usually
>>> results in the guest hanging because the bad page still belongs
>>> to qemu-kvm and is now being referenced by the new guest in some way.
>>
>> Yes. In other words my concern about reboot is that new guest kernel
>> including kdump kernel might try to read the bad page.  If there is
>> no AR-SIGBUS etc., we need some tricks to inhibit such accesses.
>>
>>> (It actually may not hang, but successfully reboot and be runnable,
>>> with the bad page lurking in the background. It all seems to depend on
>>> where the bad page ends up, and whether it's ever referenced.)
>>
>> I know some tough guys using their PC with buggy DIMMs :-)
>>
>>>
>>> I believe there was an attempt to deal with this in kvm on the host.
>>> See kvm_handle_bad_page(). This function was suppose to result in the
>>> sending of a BUS_MCEERR_AR flavored SIGBUS by do_sigbus() to qemu-kvm
>>> which in theory would result in the right thing happening. But commit
>>> 96054569190bdec375fe824e48ca1f4e3b53dd36 prevents the signal from being
>>> sent. So this mechanism needs to be re-worked, and the issue remains.
>>
>> Definitely.
>> I guess Huang has some plan or hint for rework this point.
> 
> Yes. This should be fixed. The SRAR SIGBUS should be sent directly
> instead of being sent via touching poisoned virtual address.

Good. It should work.

>>> I would think that if the the bad page can't be sidelined, such that
>>> the newly booting guest can't use it, then the new guest shouldn't be
>>> allowed to boot. But perhaps there is some merit in letting it try to
>>> boot and see if one gets 'lucky'.
>>
>> In case of booting a real machine in real world, hardware and firmware
>> usually (or often) do self-test before passing control to OS.
>> Some platform can boot OS with degraded configuration (for example,
>> fewer memory) if it has trouble on its component.  Some BIOS may
>> stop booting and show messages like "please reseat [component]" on the
>> screen.  So we could implement/request qemu to have such mechanism.
>>
>> I can understand the merit you mentioned here, in some degree. But I
>> think it is hard to say "unlucky" to customer in business...
> 
> Because the contents of poisoned pages are not relevant after reboot.
> Qemu can replace the poisoned pages with good pages when reboot guest.
> Do you think that is good.

Sure.

Of course this trick will not needed if user has done migration or
save/restore the guest before a reboot.

Thank you for answering!


Thanks,
H.Seto 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux