Re: [RFC] qemu: Redesigning guest CPU configuration

Daniel Hansel <daniel.hansel@xxxxxxxxxxxxxxxxxx> · Tue, 23 Jun 2015 11:09:57 +0200

On 19.06.2015 14:27, Daniel Hansel wrote:
> 
> 
> On 18.06.2015 15:41, Daniel P. Berrange wrote:
>> On Wed, Jun 17, 2015 at 05:37:42PM +0200, Jiri Denemark wrote:
>>> Hi all (and sorry for the long email),
>>>
>>> The current way QEMU driver handles guest CPU configuration is not
>>> ideal. We detect host CPU capabilities only by querying the CPU and we
>>> don't check with QEMU what features it supports. We don't check QEMU's
>>> definitions of CPU models, which may be different from libvirt's
>>> definitions. All this results in several issues:
>>>
>>> - guest CPU may change druing migration, save/restore
>>> - libvirt may ask for a CPU which QEMU cannot provide; the guest will
>>>   see a slightly different CPU but libvirt client won't know about it
>>> - libvirt may come up with a CPU that doesn't make sense and which won't
>>>   work for a guest (the guest may even crash)
>>>
>>> Although usually everything just works, it is very fragile.
>>
>> A third issue is that if there is no <cpu> in the guest config, we
>> just delegate CPU choice to QEMU and then ignore any CPU checks when
>> migrating. If libvirt owns the full CPU config, we'd probably want
>> to also decide the default ourselves, so that we will always be able
>> todo migrate CPU checks.
>>
>>> Since we want to fix all these issues, we need to:
>>> - guarantee stable guest ABI (a single domain XML should always results
>>>   in the same guest ABI). Once a domain is started, its CPU definition
>>>   should never change (unless someone changes the XML, of course,
>>>   similar to, e.g. PCI addresses). However, there are a few exceptions:
>>>     - host-passthrough CPU mode will always result in "-cpu host"
>>>     - host-model CPU mode should recompute the CPU model on every start,
>>>       but the CPU must not change during migration
>>> - always make sure QEMU provides the CPU we asked for. Starting a domain
>>>   should fail in case QEMU cannot provide exactly the CPU we asked for.
>>> - provide usable host-model mode and custom mode with minimum match. We
>>>   need to generate CPU configurations that actually work, i.e., we need
>>>   to ask QEMU what CPU it can provide on current host rather than
>>>   requesting a bunch of features on top of a CPU model which does not
>>>   always match the host CPU.
>>>
>>> QEMU already provides or will soon provide everything we need to meet
>>> these requirements:
>>> - we can cover every configurable part of a CPU in our cpu_map.xml and
>>>   instead of asking QEMU for a specific CPU model we can use "-cpu
>>>   custom" with a fully specified CPU
>>> - we can use the additional data about CPU models to choose the right
>>>   one for a host CPU
>>> - when starting a domain we can check whether QEMU filtered out any of
>>>   the features we asked for and refuse to start the domain
>>> - we can ask QEMU what would "-cpu host" look like and use that for
>>>   host-model and minimum match CPUs (it won't work for TCG mode, though,
>>>   but we can keep using the current CPUID detection code for TCG)
>>
>> In TCG mode of course, 'host-model' and 'host-passthrough' are
>> effectively identical, and don't actually need the host to support
>> all the featues, since TCG is fully emulated. Which means that you
>> can migrated TCG guests to anyhost with any model :-) I wonder if
>> we are probably accidentally restricting that today, becuase we
>> assume KVM needs host support.
>>
>>> Once we start maintaining CPU models with all the details, we will
>>> likely meet the same issues QEMU folks meet, i.e., we will need to fix
>>> bugs in existing CPU models. And it's not just about adding removing CPU
>>> features but also fixing other parameters, such as wrong level, etc.
>>> It's clear every change will require a new CPU model to be defined. But
>>> I think we should do it in a way that applications or users should not
>>> need (if they don't want to) to care about it. I'm thinking about doing
>>> something similar to machine types. Each CPU model could be defined in
>>> several versions and a CPU specs without a version would be an alias to
>>> the latest version.
>>
>> Agreed, I think that versioning CPU models, independantly of machine
>> types makes sense. It is probably a little more complex - in most cases
>> we'd increase the version, but in some cases I think we'd end up wanting
>> to define new named models. For example, with the recent TSX scenario we
>> had, using versions would not have been appropriate, because Intel in
>> fact ship 2 variants of the silicon. So even with with versioning, we
>> would still have wanted to introduce the noTSX variants of the models.
>>
>>> The problem is, we need to maintain backward compatibility and we should
>>> avoid breaking existing domains (shouldn't we?) which just work even
>>> though their guest CPUs do not exactly match the domain XML definitions.
>>
>> Yep breaking existing domains isn't too pleasant!
>>
>>> So either we need to define all existing CPU models in all their
>>> variants used for various machine types and have a mapping between
>>> (model without a version, machine type) to a specific version of the
>>> model (which may be quite hard) or we need to be able to distinguish
>>> between an existing domain and a new domain with no CPU model version.
>>> While host-model and host-passthrough CPU modes are easy because they
>>> are designed to change everytime a domain starts (which means we don't
>>> need to be able to distinguish between existing and new domains), custom
>>> CPU mode are tricky. Currently, the only at least a bit reasonable thing
>>> which came to my mind is to have a new CPU mode, but it still seems
>>> awkward so please share your ideas if you have any.
>>
>> Introducing a new CPU mode feels pretty unpleasant to me.
>>
>> Although it will certainly be tedious work, getting details of all the
>> CPU variants for historical machine types should be doable I think.
>>
>>> BTW, I don't think we should try to expose every part of the CPU model
>>> definitions in domain XML, they should remain hidden behind the CPU
>>> model name. It would be hard to explain what each of the extra
>>> parameters mean, each model would have to include them anyway since we
>>> can't expect users to provide all the details correctly, and once
>>> visible in domain XML it could encourage users to play with the values.
>>
>> Yeah, I don't think we need expose all the raw details. If people really
>> badly want to be able to customize that, then we should instead look at
>> how we could better enable the cpu_map.xml file to be admin extensible.
> 

Hi Daniel and Jirka,

just as a ping if you have missed my comment...

> Hi,
> 
> currently Michael Mueller (IBM) is working on an extension of QEMU to support CPU models for s390x platform.
> During the discussion on the QEMU mailing list the implementation was done in a more common way to provide support for all platforms.
> 
> According to that new implementation I have implemented a first version for libvirt to retrieve the CPU model(s) supported by QEMU on s390x.
> Due to the fact that the discussion is ongoing my prototype is not ready to be tested yet.
> 
> A short overview about the current prototype I have implemented (QEMU cpu model support patches from Michael Mueller required):
> 
> 1. During start of libvirt daemon QEMU monitor is used to retrieve the CPU models (i.e. just model names, QEMU handles all other setting like features, etc.) QEMU is supporting.
> 2. The supported CPU models are stored in libvirt's QEMU capabilities (and stored in the capabilities cache file).
> 3. Each call of virConnectGetCPUModelNames() (i.e. qemuConnectGetCPUModelNames()) is retrieving the information from QEMU capabilities (cached or not) on s390x platform.
> All other platforms remain on the currently implemented way to parse the cpu_map.xml.
> 
> Depending on that implementation all requests to get CPU models (e.g. for CPU model comparison, CPU model listing) will lead to a more appropriate result (e.g. if a QEMU binary is exchanged by a QEMU
> binary built manually).
> 
>>
>> Regards,
>> Daniel
>>
> 

-- 

Mit freundlichen Grüßen / Kind regards
Daniel Hansel

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list