On Thu, Jan 11, 2018 at 11:51:53AM +0100, Paolo Bonzini wrote: > On 11/01/2018 10:31, Paul Mackerras wrote: > > Hi Paolo, > > > > This is a pull request for a commit that adds three new KVM > > capabilities as part of the mitigation for the recently announced > > exploits CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754 (also known as > > meltdown and spectre). These capabilities tell userspace about > > whether the host machine has the vulnerabilities, and if so, whether > > it has updated firmware that enables the machine to provide > > instructions to help work around the vulnerabilities. > > > > Michael Ellerman has put the changes needed for kernels to use the > > workaround instructions to work around CVE-2017-5754 (meltdown) into > > his fixes branch and intends to ask Linus to pull them for 4.15. In a > > guest kernel, the workarounds depend on getting information from the > > platform from a new H_GET_CPU_CHARACTERISTICS hypercall. These > > capabilities provide the information that userspace (e.g. QEMU) needs > > in order to implement that hypercall. In the absence of the > > hypercall, patched guest kernels will assume the machine is vulnerable > > and will use a (slow) displacement flush loop to flush the L1 cache > > each time the kernel exits to userspace. > > Why three capabilities? Could KVM just return > KVM_PPC_GET_HOST_CPU_CHARACTERISTICS (perhaps only the characteristics > word and not the behavior ones)? The three capabilities were what came out of a discussion with David Gibson about how QEMU would implement the H_GET_CPU_CHARACTERISTICS hypercall. David wanted to be able to set a required minimum level of capability across a cluster of machines (i.e. migration domain) so that a guest that was expecting a certain level of security could rely on getting that regardless of which host it got migrated to. Expressing the host capability in terms of broken/workaround/fixed for each of the potential vulnerabilities seemed like the clearest way to represent the situation. QEMU can then have a minimum security level set on the command line, check that against the host capabilities, and only advertise the minimum level to guests. Thus QEMU might tell the guest via the H_GET_CPU_CHARACTERISTICS hypercall that it needs to apply workarounds even on a host which is actually fixed (the workaround instructions would be no-ops in that case), so that the guest can then be migrated to a host which needs the workarounds. As to the representation, we could have defined an ioctl to return the "character" and "behaviour" words just like H_GET_CPU_CHARACTERISTICS. We would need the "behaviour" word because that's how we will tell the guest which workarounds it doesn't need to implement, on machines which don't have one or more of the vulnerabilities. However, QEMU can't pass that information unmodified to the guest in general, and I think David felt the logic would be clearer working from a separate state for each vulnerability rather than having to decode that information from the "character" and "behaviour" words. If you think that a new ioctl returning character+behaviour is preferable, I can code that up easily enough. Regards, Paul.