On Mon, Nov 23, 2015 at 05:42:08PM -0200, Eduardo Habkost wrote: > I will let the people working on the actual MCE emulation in KVM > answer that. I am assuming that KVM_MCE_CAP_SUPPORTED is set to > something that makes sense. Well, that should be, IMHO, the same like all those feature bits assigned to the ->feature arrays of the different cpu types in qemu's X86CPUDefinition descriptors. > Note that we don't mimick every single detail of real CPUs out > there, and this isn't necessarily a problem (although sometimes > we choose bad defaults). Do you see real world scenarios when > choosing 10 as the default causes problems for guest OSes, or you > just worry that this might cause problems because it doesn't > match any real-world CPU? Well, the problems would come when the guests start using the MCA infrastructure bits. That's why I asked how exactly do people imagine of doing all the hardware errors handling in the guest. I know we do something with poisoning pages, i.e. kvm_send_hwpoison_signal() and all that machinery around it but in that particular case it is the hypervisor which marks the pages as poison and kvm notices that on the __get_user_pages() path and the error is injected into the guest. AFAICT, of course. In my case, I'm injecting a HW error in the guest kernel by writing into the *guest* MSRs and the *guest* kernel MCA code is supposed to handle the error. And the problem here is that I'm emulating an AMD guest. But a guest which sports an Intel-only feature and that puzzles the guest kernel. Does that make more sense? I hope... > If we really care about matching the number of banks of real > CPUs, we can make it configurable, defined by the CPU model, > and/or have better defaults in future machine-types. That won't > be a problem. I think we should try to do that if we're striving for accurate emulation of guest CPUs. But then there's the migration use-case which has different focus... > But I still don't know what we should do when somebody runs: > -machine pc-i440fx-2.4 -cpu Penryn > on a host kernel that doesn't report MCG_SER_P on > KVM_MCE_CAP_SUPPORTED. Right, before we ask that question we should ask the more generic one: how do people want to do error handling in the guest? Do they even want to? More importantly, does it even make sense to handle hardware errors in the guest? If so, which and if not, why not? I mean, no one would've noticed the MCG_SER_P issue if no one would've tried to use it and what it implies. So it all comes down to whether the guest uses the emulated feature. It seems to me this issue remained unnoticed for such a long time now for the simple reason that nothing used it. So nothing in the guest cared whether SER_P is set or not, or how many MCA banks are there. So if you run "-machine pc-i440fx-2.4 -cpu Penryn" it wouldn't matter because, AFAIK - and correct me if I'm wrong here - the guest never got to see the Action Required and Action Optional MCEs which are the result from SER_P support. So the guest didn't care. Yes, no, am I missing something completely here? > I am just saying we already clear it when running on Linux > v2.6.32-v2.6.36, it doesn't matter the host CPU or the -cpu > options we have. And we do not clear it when running Linux > v2.6.37 or newer. That's the behavior of pc-*-2.4 and older, even > if we change it on future machine-types. Right, ok. So the fact that it was clear in the v2.6.32-v2.6.36 frame and set later and nothing complained, *probably* confirms my theory that the guest didn't really care about that setting and it probably doesn't do now either... Unless you try to use it, like I did :-) Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html