On 21.03.22 10:25, Daniel P. Berrangé wrote: > On Fri, Mar 18, 2022 at 01:23:03PM -0400, Collin Walling wrote: >> On 3/15/22 15:08, David Hildenbrand wrote: >>> On 15.03.22 18:40, Boris Fiuczynski wrote: >>>> On 3/15/22 4:58 PM, David Hildenbrand wrote: >>>>> On 11.03.22 13:44, Christian Borntraeger wrote: >>>>>> >>>>>> >>>>>> Am 11.03.22 um 10:30 schrieb David Hildenbrand: >>>>>>> On 11.03.22 05:17, Collin Walling wrote: >>>>>>>> The s390x architecture has a growing list of features that will no longer >>>>>>>> be supported on future hardware releases. This introduces an issue with >>>>>>>> migration such that guests, running on models with these features enabled, >>>>>>>> will be rejected outright by machines that do not support these features. >>>>>>>> >>>>>>>> A current example is the CSSKE feature that has been deprecated for some time. >>>>>>>> It has been publicly announced that gen15 will be the last release to >>>>>>>> support this feature, however we have postponed this to gen16a. A possible >>>>>>>> solution to remedy this would be to create a new QEMU QMP Response that allows >>>>>>>> users to query for deprecated/unsupported features. >>>>>>>> >>>>>>>> This presents two parts of the puzzle: how to report deprecated features to >>>>>>>> a user (libvirt) and how should libvirt handle this information. >>>>>>>> >>>>>>>> First, let's discuss the latter. The patch presented alongside this cover letter >>>>>>>> attempts to solve the migration issue by hard-coding the CSSKE feature to be >>>>>>>> disabled for all s390x CPU models. This is done by simply appending the CSSKE >>>>>>>> feature with the disabled policy to the host-model. >>>>>>>> >>>>>>>> libvirt pseudo: >>>>>>>> >>>>>>>> if arch is s390x >>>>>>>> set CSSKE to disabled for host-model >>>>>>> >>>>>>> That violates host-model semantics and possibly the user intend. There >>>>>>> would have to be some toggle to manually specify this, for example, a >>>>>>> new model type or a some magical flag. >>>>>> >>>>>> What we actually want to do is to disable csske completely from QEMU and >>>>>> thus from the host-model. Then it would not violate the spec. >>>>>> But this has all kind of issues (you cannot migrate from older versions >>>>>> of software and machines) although the hardware still can provide the feature. >>>>>> >>>>>> The hardware guys promised me to deprecate things two generations earlier >>>>>> and we usually deprecate things that are not used or where software has a >>>>>> runtime switch. >>>>>> >>>>>> From what I hear from you is that you do not want to modify the host-model >>>>>> semantics to something more useful but rather define a new thing (e.g. "host-sane") ? >>>>> >>>>> My take would be, to keep the host model consistent, meaning, the >>>>> semantics in QEMU exactly match the semantics in Libvirt. It defines the >>>>> maximum CPU model that's runnable under KVM. If a feature is not >>>>> included (e.g., csske) that feature cannot be enabled in any way. >>>>> >>>>> The "host model" has the semantics of resembling the actual host CPU. >>>>> This is only partially true, because we support some features the host >>>>> might not support (e.g., zPCI IIRC) and obviously don't support all host >>>>> features in QEMU. >>>>> >>>>> So instead of playing games on the libvirt side with the host model, I >>>>> see the following alternatives: >>>>> >>>>> 1. Remove the problematic features from the host model in QEMU, like "we >>>>> just don't support this feature". Consequently, any migration of a VM >>>>> with csske=on to a new QEMU version will fail, similar to having an >>>>> older QEMU version without support for a certain feature. >>>>> >>>>> "host-passthrough" would change between QEMU versions ... which I see as >>>>> problematic. >>>>> >>>>> 2. Introduce a new CPU model that has these new semantics: "host model" >>>>> - deprecated features. Migration of older VMs with csske=on to a new >>>>> QEMU version will work. Make libvirt use/expand that new CPU model >>>>> >>>>> It doesn't necessarily have to be an actual new cpu model. We can use a >>>>> feature group, like "-cpu host,deprectated-features=false". What's >>>>> inside "deprecated-features" will actually change between QEMU versions, >>>>> but we don't really care, as the expanded CPU model won't change. >>>>> >>>>> "host-passthrough" won't change between QEMU versions ... >>>>> >>>>> 3. As Daniel suggested, don't use the host model, but a CPU model >>>>> indicated as "suggested". >>>>> >>>>> The real issue is that in reality, we don't simply always use a model >>>>> like "gen15a", but usually want optional features, if they are around. >>>>> Prime examples are "sie" and friends. >>>>> >>>>> >>>>> >>>>> I tend to prefer 2. With 3. I see issues with optional features like >>>>> "sie" and friends. Often, you really want "give me all you got, but >>>>> disable deprecated features that might cause problems in the future". >>>>> >>>> >>>> David, >>>> if I understand you proposal 2 correctly it sounds a lot like Christians >>>> idea of leaving the CPU mode "host-model" as is and introduce a new CPU >>>> mode "host-recommended" for the new semantics in which >>>> query-cpu-model-expansion would be called with the additional >>>> "deprectated-features" property. >>>> That way libvirt would not have to fiddle around with the deprecation >>>> itself and users would have the option which semantic they want to use. >>>> Is that correct? >>> >>> Yes, exactly. >>> >>> >> >> From what I understand: >> >> QEMU >> - add a "deprecated-features" feature group (more-or-less David's code) >> >> libvirt >> - recognize a new model name "host-recommended" >> - query QEMU for host-model + deprecated-features and cache it in caps >> file (something like <hostRecCpu>) >> - when guest is defined with "host-recommended", pull <hostRecCPU> from >> caps when guest is started (similar to how host-model works today) >> >> If this is sufficient, then I can then get to work on this. >> >> My question is what would be the best way to include the deprecated >> features when calculating a baseline or comparison. Both work with the >> host-model and may no longer present an accurate result. Say, for >> example, we baseline a z15 with a gen17 (which will outright not support >> CSSKE). With today's implementation, this might result in a ridiculously >> old CPU model which also does not support CSSKE. The ideal response >> would be a z15 - deprecated features (i.e. host-recommended on a z15), >> but we'd need a way to flag to QEMU that we want to exclude the >> deprecated features. Or am I totally wrong about this? > > QEMU has a concept of versioned QEMU models, so you could define a > z15-v2 version without CSSKE gen15a already comes with csske=false. s390x does not implement versioned CPU models and as I raised in the past, that concept is rather a bad fit for s390x. -- Thanks, David / dhildenb