On 01/30/2018 03:46 PM, Christophe de Dinechin wrote: > > >> On 30 Jan 2018, at 13:11, Christian Borntraeger <borntraeger@xxxxxxxxxx> wrote: >> >> >> >> On 01/30/2018 01:23 AM, Linus Torvalds wrote: >> [...] >>> >>> So I actually have a _different_ question to the virtualization >>> people. This includes the vmware people, but it also obviously >>> incldues the Amazon AWS kind of usage. >>> >>> When you're a hypervisor (whether vmware or Amazon), why do you even >>> end up caring about these things so much? You're protected from >>> meltdown thanks to the virtual environment already having separate >>> page tables. And the "big hammer" approach to spectre would seem to >>> be to just make sure the BTB and RSB are flushed at vmexit time - and >>> even then you might decide that you really want to just move it to >>> vmenter time, and only do it if the VM has changed since last time >>> (per CPU). >>> >>> Why do you even _care_ about the guest, and how it acts wrt Skylake? >>> What you should care about is not so much the guests (which do their >>> own thing) but protect guests from each other, no? >>> >>> So I'm a bit mystified by some of this discussion within the context >>> of virtual machines. I think that is separate from any measures that >>> the guest machine may then decide to partake in. >>> >>> If you are ever going to migrate to Skylake, I think you should just >>> always tell the guests that you're running on Skylake. That way the >>> guests will always assume the worst case situation wrt Specte. >>> >>> Maybe that mystification comes from me missing something. >> >> I can only speak for KVM, but I think the hypervisor issues come from >> the fact that for migration purposes the hypervisor "lies" to the guest >> in regard to what kind of CPU is running. (it has to lie, see below). >> >> This is to avoid random guest crashes by not announcing features. For >> example if you want to migrate forth and back between a system that >> has AVX512 and another one that has not you must tell the guest that >> AVX512 is not available - even if it runs on the capable system. >> >> To protect against new features the hypervisor only announces features >> that it understands. >> So you essentially start a VM in QEMU of a given CPU type that is >> constructed of a base cpu type plus extra features. Before migration, >> it is checked if he target system can run a guest of given type - >> otherwise migration is rejected. >> >> The management stack also knows things like baselining - basically >> creating the best possible guest CPU given a set of hosts. >> >> The problem now is: If you have lets say Broadwell and Skylakes. >> What kind of CPU type are you telling your guest? If you claim >> broadwell but run on skylake then you prevent that the guest can >> protect itself, because the guest does not know that it should do >> something special. If you say skylake the guest might start using >> features that broadwell does not understand. > > I believe that Linus’ question was whether it makes sense to defer > the entirety of the protection to the host kernel, although I was a bit > confused by his suggestion to always assume Skylake. > > In other words, is it safe enough to rely on the host kernel countermeasure > to protect guest kernels and their applications? In which case having > the guest believe it runs on Broadwell would not be that problematic. > > Aren’t there enough vmexits on the guest kernel context switch > to enforce protection on its behalf? Even if it’s > > a) some old kernel that without mitigation code > > or > > b) some new kernel that thinks it runs on an old CPU and disabled mitigation > I think it is not safe to just protect the host. CPU bound workload in the guest will switch a lot between guest user and guest kernel without triggering an exit.