On Mon, Mar 04, 2024 at 10:35:25AM -0700, Jim Fehlig wrote: > On 3/1/24 10:13, Daniel P. Berrangé wrote: > > On Fri, Mar 01, 2024 at 10:36:12AM -0600, Jonathon Jongsma wrote: > > > On 3/1/24 10:13 AM, Daniel P. Berrangé wrote: > > > > On Tue, Feb 20, 2024 at 05:08:02PM -0700, Jim Fehlig wrote: > > > > > On 12/15/23 15:11, Jonathon Jongsma wrote: > > > > > > Previously, the script only generated the parent CPU and any versions > > > > > > that had a defined alias. The script now generates all CPU versions. Any > > > > > > version that had a defined alias will continue to use that alias, but > > > > > > those without aliases will use the generated name $BASECPUNAME-vN. > > > > > > > > > > > > The reason for this change is two-fold. First, we need to add new models > > > > > > that support new features (such as SEV-SNP). To deal with this, the > > > > > > script now generates model definitions for all versions. > > > > > > > > > > > > But we also need to ensure that our CPU definitions are migration-safe. > > > > > > To deal with this issue we need to make sure we're always using the > > > > > > canonical versioned names for CPUs. > > > > > > > > > > Related to migration safety, do we need to be concerned with the expansion > > > > > of 'host-model' CPU? E.g. is it possible 'host-model' expands to EPYC before > > > > > introducing the new models, and EPYC-v4 afterwards? If so, what are the > > > > > ramifications of that? > > > > > > > > Yes, I see that happening on my laptop in domcapabilities: > > > > > > > > Currently libvirt reports: > > > > > > > > <mode name='host-model' supported='yes'> > > > > <model fallback='forbid'>Snowridge</model> > > > > <vendor>Intel</vendor> > > > > <maxphysaddr mode='passthrough' limit='46'/> > > > > <feature policy='require' name='ss'/> > > > > <feature policy='require' name='vmx'/> > > > > ...snip... > > > > > > > > > > > > and after this series it reports: > > > > > > > > <mode name='host-model' supported='yes'> > > > > <model fallback='forbid'>Snowridge-v4</model> > > > > <vendor>Intel</vendor> > > > > <maxphysaddr mode='passthrough' limit='46'/> > > > > <feature policy='require' name='ss'/> > > > > <feature policy='require' name='vmx'/> > > > > ...snip... > > > > > > > > > > > > That's not wrong per-se, becasue Snowrigde-v4 has a smaller > > > > delta against my host CPU. > > > > > > > > The problem is that libvirt updates the *live* XML for the > > > > guest with this expansion. IIUC, if we now attempt to > > > > live migrate to a compatible machine running older libvirt > > > > the migrate will fail as old libvirt doesn't know the -v4 > > > > CPU. > > Downstream, we (SUSE) don't really support migrating from new -> old. Is > this something we aim to support upstream? Kind of, sort of, yes and no :) The VIR_DOMAIN_XML_MIGRATABLE flag is a bit of an attempt to make it possible to format XML in a way that's (hopefully) mostly acceptable to older libvirt. The devil is in the detail though, and there's never really been any formal testing to prove correctness, so new -> old is one of those things that may work, please report bugs if we missed something. > > > > I'm not sure how to address this ? > > > > > > But don't we have this issue any time we add a new CPU model to libvirt? > > > Anytime there's a new model, it has the potential to be a closer match to > > > the host CPU than an existing model definition was. As I mentioned in my > > > previous reply, when e.g. the -noTSX CPU variants were added, didn't the > > > same sort of thing (potentially) happen? Or am I doing something > > > meaningfully different in this patch set than what happens in those > > > scenarios? > > > > I think it probably /did/ happen, but that doesn't make it acceptable. > > The noTSX stuff was the cause of massive amounts of compatibility pain > > for mgmt apps, so the incompatibility in libvirt might have been glossed > > over. We're adding alot of new versions here, so the possibly increasing > > the visibility/impact of this libvirt change. > > It can happen when we introduce an entirely new CPU model too. E.g. on a > Genoa machine, prior to commit bfe53e9145c, host model expanded to Yeah, true, so that's a general problem with 'host-model' when introducing new CPU generations, if that post-dates a user deploying on said CPU generation.. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| _______________________________________________ Devel mailing list -- devel@xxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxx