Re: MCG_CAP ABI breakage (was Re: [Qemu-devel] [PATCH] target-i386: Do not set MCG_SER_P by default)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 23, 2015 at 05:43:14PM +0100, Borislav Petkov wrote:
> On Mon, Nov 23, 2015 at 01:11:27PM -0200, Eduardo Habkost wrote:
> > On Mon, Nov 23, 2015 at 11:22:37AM -0200, Eduardo Habkost wrote:
> > [...]
> > > In the case of this code, it looks like it's already broken
> > > because the resulting mcg_cap depends on host kernel capabilities
> > > (the ones reported by kvm_get_mce_cap_supported()), and the data
> > > initialized by target-i386/cpu.c:mce_init() is silently
> > > overwritten by kvm_arch_init_vcpu(). So we would need to fix that
> > > before implementing a proper compatibility mechanism for
> > > mcg_cap.
> > 
> > Fortunately, when running Linux v2.6.37 and later,
> > kvm_arch_init_vcpu() won't actually change mcg_cap (see details
> > below).
> > 
> > But the code is broken if running on Linux between v2.6.32 and
> > v2.6.36: it will clear MCG_SER_P silently (and silently enable
> > MCG_SER_P when migrating to a newer host).
> > 
> > But I don't know what we should do on those cases. If we abort
> > initialization when the host doesn't support MCG_SER_P, all CPU
> > models with MCE and MCA enabled will become unrunnable on Linux
> > between v2.6.32 and v2.6.36. Should we do that, and simply ask
> > people to upgrade their kernels (or explicitly disable MCE) if
> > they want to run latest QEMU?
> > 
> > For reference, these are the capabilities returned by Linux:
> > * KVM_MAX_MCE_BANKS is 32 since
> >   890ca9aefa78f7831f8f633cab9e4803636dffe4 (v2.6.32-rc1~693^2~199)
> > * KVM_MCE_CAP_SUPPORTED is (MCG_CTL_P | MCG_SER_P) since
> >   5854dbca9b235f8cdd414a0961018763d2d5bf77 (v2.6.37-rc1~142^2~3)
> 
> The commit message of that one says that there is MCG_SER_P support in
> the kernel.
> 
> The previous commit talks about MCE injection with KVM_X86_SET_MCE
> ioctl but frankly, I don't see that. From looking at the current code,
> KVM_X86_SET_MCE does kvm_vcpu_ioctl_x86_setup_mce() which simply sets
> MCG_CAP. And it gets those from KVM_X86_GET_MCE_CAP_SUPPORTED which is
> 
> #define KVM_MCE_CAP_SUPPORTED (MCG_CTL_P | MCG_SER_P)
> 
> So it basically sets those two supported bits. But how is
> 
> 	supported == actually present
> 
> ?!?!
> 
> That soo doesn't make any sense.

I will let the people working on the actual MCE emulation in KVM
answer that. I am assuming that KVM_MCE_CAP_SUPPORTED is set to
something that makes sense.

> 
> > * KVM_MCE_CAP_SUPPORTED is MCG_CTL_P between
> >   890ca9aefa78f7831f8f633cab9e4803636dffe4 (v2.6.32-rc1~693^2~199)
> >   and 5854dbca9b235f8cdd414a0961018763d2d5bf77 (v2.6.37-rc1~142^2~3)
> > 
> > The current definitions in QEMU are:
> >     #define MCE_CAP_DEF     (MCG_CTL_P|MCG_SER_P)
> >     #define MCE_BANKS_DEF   10
> 
> That's also wrong. The number of banks is, of course,
> generation-specific.

Note that we don't mimick every single detail of real CPUs out
there, and this isn't necessarily a problem (although sometimes
we choose bad defaults). Do you see real world scenarios when
choosing 10 as the default causes problems for guest OSes, or you
just worry that this might cause problems because it doesn't
match any real-world CPU?

The number of banks is whatever QEMU chooses to be the number of
banks. The pc-*-2.4 and older machine-types already expose 10
banks, but we can change it in future machine-types.


> The qemu commit adding that is
> 
> 79c4f6b08009 ("QEMU: MCE: Add MCE simulation to qemu/tcg")
> 
> and I really don't get what the reasoning behind it is? Is this supposed
> to mimick a certain default CPU which is not really resembling a real
> CPU or some generic CPU or what is it?

I don't know the reasoning behind those defaults, but that's the
existing default in pc-*-2.4 and older.

> 
> Because the moment you start qemu with -cpu <something AMD>, all that
> MCA information is totally wrong.

If we really care about matching the number of banks of real
CPUs, we can make it configurable, defined by the CPU model,
and/or have better defaults in future machine-types. That won't
be a problem.

But I still don't know what we should do when somebody runs:
  -machine pc-i440fx-2.4 -cpu Penryn
on a host kernel that doesn't report MCG_SER_P on
KVM_MCE_CAP_SUPPORTED.

> 
> > The target-i386/cpu.c:mce_init() code sets mcg_cap to:
> >     env->mcg_cap == MCE_CAP_DEF | MCE_BANKS_DEF;
> >                  == (MCG_CTL_P|MCG_SER_P) | 10;
> > 
> > The kvm_arch_init_vcpu() code that changes mcg_cap does the following:
> >     kvm_get_mce_cap_supported(cs->kvm_state, &mcg_cap, &banks);
> >     if (banks > MCE_BANKS_DEF) {
> >         banks = MCE_BANKS_DEF;
> >     }
> >     mcg_cap &= MCE_CAP_DEF;
> >     mcg_cap |= banks;
> >     env->mcg_cap = mcg_cap;
> >   * Therefore, if running Linux v2.6.37 or newer, this will be
> >     the result:
> >     banks == 10;
> >     mcg_cap == (MCG_CTL_P|MCG_SER_P) | banks
> >             == (MCG_CTL_P|MCG_SER_P) | 10;
> >     * That's the same value set by mce_init(), so fortunately
> >       kvm_arch_init_vcpu() isn't actually changing mcg_cap
> >       if running Linux v.2.6.37 and newer.
> >   * However, if running Linux between v2.6.32 and v2.6.37,
> >     kvm_arch_init_vcpu() will silently clear MCG_SER_P.
> 
> I don't understand - you want that cleared when emulating a !Intel CPU
> or an Intel CPU which doesn't support SER. This bit should be clear
> there. Why should be set at all there?

I am just saying we already clear it when running on Linux
v2.6.32-v2.6.36, it doesn't matter the host CPU or the -cpu
options we have. And we do not clear it when running Linux
v2.6.37 or newer. That's the behavior of pc-*-2.4 and older, even
if we change it on future machine-types.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux