Re: [PATCH] Ignore SRAO MCE if another MCE is being processed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2010-04-28 at 00:12 +0800, Marcelo Tosatti wrote:
> On Tue, Apr 27, 2010 at 03:10:49PM +0800, Huang Ying wrote:
> > In common cases, guest SRAO MCE will cause corresponding poisoned page
> > be un-mapped in host and SIGBUS be sent to QEMU-KVM, then QEMU-KVM
> > will relay the MCE to guest OS.
> > 
> > But it is possible that the poisoned page is accessed in guest after
> > un-mapped in host and before MCE is relayed to guest OS. So that, the
> > SRAR SIGBUS is sent to QEMU-KVM before the SRAO SIGBUS, and if
> > QEMU-KVM relays them to guest OS one by one, guest system may reset,
> > because the SRAO MCE may be triggered while the SRAR MCE is being
> > processed. In fact, the SRAO MCE can be ignored in this situation, so
> > that the guest system is given opportunity to survive.
> > 
> > Signed-off-by: Huang Ying <ying.huang@xxxxxxxxx>
> > ---
> >  qemu-kvm.c |   28 ++++++++++++++++++++++++++++
> >  1 file changed, 28 insertions(+)
> > 
> > --- a/qemu-kvm.c
> > +++ b/qemu-kvm.c
> > @@ -1610,6 +1610,19 @@ static void flush_queued_work(CPUState *
> >      pthread_cond_broadcast(&qemu_work_cond);
> >  }
> >  
> > +static int kvm_mce_in_exception(CPUState *env)
> > +{
> > +    struct kvm_msr_entry msr_mcg_status = {
> > +        .index = MSR_MCG_STATUS,
> > +    };
> > +    int r;
> > +
> > +    r = kvm_get_msrs(env, &msr_mcg_status, 1);
> > +    if (r == -1 || r == 0)
> > +        return -1;
> > +    return !!(msr_mcg_status.data & MCG_STATUS_MCIP);
> > +}
> > +
> >  static void kvm_on_sigbus(CPUState *env, siginfo_t *siginfo)
> >  {
> >  #if defined(KVM_CAP_MCE) && defined(TARGET_I386)
> > @@ -1630,6 +1643,15 @@ static void kvm_on_sigbus(CPUState *env,
> >              mce.misc = (MCM_ADDR_PHYS << 6) | 0xc;
> >              mce.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV;
> >          } else {
> > +            /*
> > +             * If there is an MCE excpetion being processed, ignore
> > +             * this SRAO MCE
> > +             */
> > +            r = kvm_mce_in_exception(env);
> > +            if (r == -1)
> > +                fprintf(stderr, "Failed to get MCE status\n");
> > +            else if (r)
> > +                return;
> >              /* Fake an Intel architectural Memory scrubbing UCR */
> >              mce.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
> >                  | MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
> > @@ -2475,6 +2497,12 @@ static void kvm_do_inject_x86_mce(void *
> >      struct kvm_x86_mce_data *data = _data;
> >      int r;
> >  
> > +    /* If there is an MCE excpetion being processed, ignore this SRAO MCE */
> > +    r = kvm_mce_in_exception(data->env);
> > +    if (r == -1)
> > +        fprintf(stderr, "Failed to get MCE status\n");
> > +    else if (r && !(data->mce->status & MCI_STATUS_AR))
> > +        return;
> 
> Don't you need to set the OVER bit in the MCI_STATUS register when 
> this happens?

The OVER bit is set when uncorrected error overwrite the corrected
error. There is no specification for OVER bit for this situation. I just
don't find benefit for it.

> Unrelated to this patch, it would be nice if you can share the testing
> code.

There is some test script and document for this in:

git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git

test script is in "kvm" directory, testing document is kvm/README

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux