MCG_CAP ABI breakage (was Re: [Qemu-devel] [PATCH] target-i386: Do not set MCG_SER_P by default)

Eduardo Habkost <ehabkost@xxxxxxxxxx> · Mon, 23 Nov 2015 13:11:27 -0200

On Mon, Nov 23, 2015 at 11:22:37AM -0200, Eduardo Habkost wrote:
[...]
> In the case of this code, it looks like it's already broken
> because the resulting mcg_cap depends on host kernel capabilities
> (the ones reported by kvm_get_mce_cap_supported()), and the data
> initialized by target-i386/cpu.c:mce_init() is silently
> overwritten by kvm_arch_init_vcpu(). So we would need to fix that
> before implementing a proper compatibility mechanism for
> mcg_cap.

Fortunately, when running Linux v2.6.37 and later,
kvm_arch_init_vcpu() won't actually change mcg_cap (see details
below).

But the code is broken if running on Linux between v2.6.32 and
v2.6.36: it will clear MCG_SER_P silently (and silently enable
MCG_SER_P when migrating to a newer host).

But I don't know what we should do on those cases. If we abort
initialization when the host doesn't support MCG_SER_P, all CPU
models with MCE and MCA enabled will become unrunnable on Linux
between v2.6.32 and v2.6.36. Should we do that, and simply ask
people to upgrade their kernels (or explicitly disable MCE) if
they want to run latest QEMU?

For reference, these are the capabilities returned by Linux:
* KVM_MAX_MCE_BANKS is 32 since
  890ca9aefa78f7831f8f633cab9e4803636dffe4 (v2.6.32-rc1~693^2~199)
* KVM_MCE_CAP_SUPPORTED is (MCG_CTL_P | MCG_SER_P) since
  5854dbca9b235f8cdd414a0961018763d2d5bf77 (v2.6.37-rc1~142^2~3)
* KVM_MCE_CAP_SUPPORTED is MCG_CTL_P between
  890ca9aefa78f7831f8f633cab9e4803636dffe4 (v2.6.32-rc1~693^2~199)
  and 5854dbca9b235f8cdd414a0961018763d2d5bf77 (v2.6.37-rc1~142^2~3)

The current definitions in QEMU are:
    #define MCE_CAP_DEF     (MCG_CTL_P|MCG_SER_P)
    #define MCE_BANKS_DEF   10

The target-i386/cpu.c:mce_init() code sets mcg_cap to:
    env->mcg_cap == MCE_CAP_DEF | MCE_BANKS_DEF;
                 == (MCG_CTL_P|MCG_SER_P) | 10;

The kvm_arch_init_vcpu() code that changes mcg_cap does the following:
    kvm_get_mce_cap_supported(cs->kvm_state, &mcg_cap, &banks);
    if (banks > MCE_BANKS_DEF) {
        banks = MCE_BANKS_DEF;
    }
    mcg_cap &= MCE_CAP_DEF;
    mcg_cap |= banks;
    env->mcg_cap = mcg_cap;
  * Therefore, if running Linux v2.6.37 or newer, this will be
    the result:
    banks == 10;
    mcg_cap == (MCG_CTL_P|MCG_SER_P) | banks
            == (MCG_CTL_P|MCG_SER_P) | 10;
    * That's the same value set by mce_init(), so fortunately
      kvm_arch_init_vcpu() isn't actually changing mcg_cap
      if running Linux v.2.6.37 and newer.
  * However, if running Linux between v2.6.32 and v2.6.37,
    kvm_arch_init_vcpu() will silently clear MCG_SER_P.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html