On Wed, Nov 23, 2022 at 12:14:45 +0000, Daniel P. Berrangé wrote: > On Wed, Nov 23, 2022 at 12:51:39PM +0100, Jiri Denemark wrote: > > On Wed, Nov 23, 2022 at 11:41:03 +0000, Daniel P. Berrangé wrote: > > > On Wed, Nov 23, 2022 at 12:29:24PM +0100, Jiri Denemark wrote: > > > > On Wed, Nov 23, 2022 at 11:18:01 +0000, Daniel P. Berrangé wrote: > > > > > On Tue, Nov 22, 2022 at 01:48:39PM +0100, Jiri Denemark wrote: > > > > > > On Mon, Nov 21, 2022 at 12:47:58 +0530, manish.mishra wrote: > > > > > > > > > > > > > > On 18/11/22 5:00 pm, Jiri Denemark wrote: > > > > > > > > On Fri, Nov 18, 2022 at 10:18:59 +0000, John Levon wrote: > > > > > > > >> On Fri, Nov 18, 2022 at 10:52:32AM +0100, Jiri Denemark wrote: > > > > > > > >> > > > > > > > >>>> * Qemu already provide an option 'enforce' to validate if features > > > > > > > >>>> with which vm is started is exactly same as one provided and nothing > > > > > > > >>>> is silently dropped. > > > > > > > >>> Right, but it's not enough. In addition to removed features libvirt also > > > > > > > >>> checks for unexpectedly added features. And you really need to do both. > > > > > > > >>> Because if you ask for -cpu Model,feat1=on,feat2=on,enforce and QEMU > > > > > > > >>> says everything is fine, the guest might see more than what you asked. > > > > > > > >>> For example, if a feature is enabled only if a host supports it you may > > > > > > > >>> or may not get it without any complains from QEMU. But if you get it you > > > > > > > >>> really need to explicitly ask for it during migration, otherwise the > > > > > > > >>> feature can just silently disappear. Of course, this would be a really > > > > > > > >>> bad behavior from QEMU, but that does not mean it can't happen (I think > > > > > > > >>> SVM is a bit problematic in this way) and the whole point of libvirt's > > > > > > > >>> checks is to prevent this kind of issues. > > > > > > > >> Hi Jiri, I'm not following this very well. I think you're saying that qemu has > > > > > > > >> had bugs previously where features get silently enabled, > > > > > > > > Personally, I haven't actually witnessed any bug in this area (as far as > > > > > > > > I can remember, which is not that far :-)), but got some reports of at > > > > > > > > least one, even though without any proof. > > > > > > > > > > > > > > > > Specifically, SVM is quite strange as it is included in all AMD CPU > > > > > > > > models in QEMU and yet if you try to start it on a host without nesting > > > > > > > > enabled "enforce" does not complain. I saw the feature is enabled with > > > > > > > > older machine types, but I was told the magic behind this feature looks > > > > > > > > like not only machine type but even host configuration itself is > > > > > > > > involved. Anyway, although I saw reports of that the feature could be > > > > > > > > enabled without an explicit request even with new machine types I still > > > > > > > > haven't seen any proof of this happening. So I hope it just does not > > > > > > > > happen and users are only afraid of this possibility. > > > > > > > > > > > > > > > >> and it's libvirt's job/role to paper over those issues? Do you have > > > > > > > >> some specific cases of this? > > > > > > > > This is not about papering. It's actually the opposite, that is about > > > > > > > > detecting if something like this happens and reporting it as a failure > > > > > > > > rather than papering it and hoping everything goes well. So I think > > > > > > > > doing this is a good idea even though I don't think we actually saw any > > > > > > > > issue of this kind. > > > > > > > > > > > > > > > > > > > > > Hi Jiri, I see now with libvirt master, with check=='full' we verify > > > > > > > both silently dropped as well as added features. But as you already > > > > > > > stated Qemu silently adding feature is a Qemu bug and libvirt just > > > > > > > reports that bug, so it should be very unlikely, i agree that is not a > > > > > > > good reasoning :). Our requirement is that we want to use CPU Models > > > > > > > and features which are defined in Qemu but not in libvirt for e.g if > > > > > > > we want to use Icelake-Server-V4 directly or newly added vmx feature, > > > > > > > libvirt does not allow. Currently we take help of qemu to do > > > > > > > validations but for cpu feature verfication and model definations we > > > > > > > still use libvirt defined definations which prevent us to use anything > > > > > > > which is not defined in libvirt. I see there are already efforts going > > > > > > > on to get all model and feature defination from qemu itself, but not > > > > > > > sure how much time it will take. Till that happens we thought safest > > > > > > > option is to have an option to remove all validations from libvirt and > > > > > > > rely on qemu 'enforce' for migration safetly. I understand > > > > > > > qemu-enforce does not check for silently added features, but that case > > > > > > > we can assume is very unlikely and Qemu should fix otherwise VMs will > > > > > > > not poweron anyway with check=='full'. Basically we want it as an > > > > > > > modification of check='none' but also skipping things like > > > > > > > virCPUValidateFeatures and passing option 'enforce' to Qemu. Or if > > > > > > > silently adding features is that big concern we can have a check in > > > > > > > Qemu itself? I understand current qemu-enforce is not as migration > > > > > > > safe as check=='full' but probably suitable for our use case for time > > > > > > > being? > > > > > > > > > > > > I understand why you want this, but on the other hand adding just a > > > > > > plain pass-through for CPU model and features is quite dangerous as it > > > > > > can be used to set any -cpu option even if it's not a feature. And > > > > > > especially the parts that are configured in other areas of domain XML > > > > > > (such as hypervisor features). So I think instead of adding a new mode > > > > > > for <cpu> we should go another way. For things that are not yet > > > > > > supported by libvirt we support various elements in qemu namespace, > > > > > > e.g., setting QEMU command line options, environment variables, or even > > > > > > overriding options libvirt would normally set on the command line. So I > > > > > > guess we could have a special <qemu:cpu> element which would be used > > > > > > when a user wants full control over the -cpu option without any libvirt > > > > > > intervention. > > > > > > > > > > I really don't think we should allow this, especially not for something > > > > > which is clearly intended to be used for production deployment. Our > > > > > hypervisor CLI passthrough is there to allow for short term workarounds > > > > > for bugs, or to experiment with a feature before it is mapped into > > > > > libvirt in a supported manner. > > > > > > > > > > If there are aspects related to QEMU configuration thiat are not in > > > > > libvirt yet, we need to address those gaps, not provide yet another > > > > > way to bypass libvirt mapping. > > > > > > > > Indeed, this was definitely meant as a short term workaround for stuff > > > > we don't have support for yet, for testing or experimental purposes. The > > > > supported approach is for implement the missing parts in libvirt (and > > > > QEMU) as soon as possible. I don't see why would a properly documented > > > > support for experiments with -cpu would be an issue. > > > > > > Why can't it just use the exsting QEMU passthrough syntax we have. > > > I don't think we should be adding specifial support just for CPUs > > > > That would be nice, but the QEMU passthrough syntax cannot be used for > > changing options that libvirt already passes to QEMU. So using it would > > likely result in two separate -cpu options on QEMU command line. And it > > would not rule out the CPU verification code in libvirt. Remember, we > > add a default -cpu model in case there's none configured in the XML. > > Two -cpu options isn't a problem, the latter will override the former, > which is fine from the level of support intended for QEMU passthrough > usage. Oh, OK. In that case this would be usable. But see below. > For the libvirt checking, isn't the 'check=none' attr sufficient > to skip checks libvirt does. It does, although once a domain is started, we update the CPU definition in the XML to use check='full'. So check='none' is only usable for starting a domain. Migration will always use check='full'. We would need to come up with a new check value to completely turn off this functionality. Jirka