* Daniel P. Berrangé (berrange@xxxxxxxxxx) wrote: > On Fri, Nov 02, 2018 at 10:40:35AM +0100, Paolo Bonzini wrote: > > On 02/11/2018 04:46, Liran Alon wrote: > > >> On Thu, Nov1, 2018 at 09:45 AM, Jim Mattson <jmattson@xxxxxxxxxx> wrote: > > > > > >>> On Thu, Nov 1, 2018 at 8:56 AM, Dr. David Alan Gilbert <dgilbert@xxxxxxxxxx> wrote: > > > > > >>> So if I have matching host kernels it should always work? > > >>> What happens if I upgrade the source kernel to increase it's maximum > > >>> nested size, can I force it to keep things small for some VMs? > > > > > >> Any change to the format of the nested state should be gated by a > > >> KVM_CAP set by userspace. (Unlike, say, how the > > >> KVM_VCPUEVENT_VALID_SMM flag was added to the saved VCPU events state > > >> in commit f077825a8758d.) KVM has traditionally been quite bad about > > >> maintaining backwards compatibility, but I hope the community is more > > >> cognizant of the issues now. > > > > > >> As a cloud provider, one would only enable the new capability from > > >> userspace once all hosts in the pool have a kernel that supports it. > > >> During the transition, the capability would not be enabled on the > > >> hosts with a new kernel, and these hosts would continue to provide > > >> nested state that could be consumed by hosts running the older kernel. > > > > > > Hmm this makes sense. > > > > > > This means though that the patch I have submitted here isn't good enough. > > > My patch currently assumes that when it attempts to get nested state from KVM, > > > QEMU should always set nested_state->size to max size supported by KVM as received > > > from kvm_check_extension(s, KVM_CAP_NESTED_STATE); > > > (See kvm_get_nested_state() introduced on my patch). > > > This indeed won't allow migration from host with new KVM to host with old KVM if > > > nested_state size was enlarged between these KVM versions. > > > Which is obviously an issue. > > > > Actually I think this is okay, because unlike the "new" capability was > > enabled, KVM would always reduce nested_state->size to a value that is > > compatible with current kernels. > > > > > But on second thought, I'm not sure that this is the right approach as-well. > > > We don't really want the used version of nested_state to be determined on kvm_init(). > > > * On source QEMU, we actually want to determine it when preparing for migration based > > > on to the support given by our destination host. If it's an old host, we would like to > > > save an old version nested_state and if it's a new host, we will like to save our newest > > > supported nested_state. > > > > No, that's wrong because it would lead to losing state. If the source > > QEMU supports more state than the destination QEMU, and the current VM > > state needs to transmit it for migration to be _correct_, then migration > > to that destination QEMU must fail. > > > > In particular, enabling the new KVM capability needs to be gated by a > > new machine type and/or -cpu flag, if migration compatibility is needed. > > (In particular, this is one reason why I haven't considered this series > > for 3.1. Right now, migration of nested hypervisors is completely > > busted but if we make it "almost" work, pre-3.1 machine types would not > > ever be able to add support for KVM_CAP_EXCEPTION_PAYLOAD. Therefore, > > it's better for users if we wait for one release more, and add support > > for KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD at the same time). > > > > Personally, I would like to say that, starting from QEMU 3.2, enabling > > nested VMX requires a 4.20 kernel. It's a bit bold, but I think it's a > > good way to keep some sanity. Any opinions on that? > > We have usually followed a rule that new machine types must not > affect runability of a VM on a host. IOW new machine types should > not introduce dependancies on specific kernels, or hardware features > such as CPU flags. > > Anything that requires a new kernel feature thus ought to be an > opt-in config tunable on the CLI, separate from machine type > choice. Alternatively in this case, it could potentially be a > migration parameter settable via QMP. QEMU on each side could > advertize whether the migration parameter is available, and > the mgmt app (which can see both sides of the migration) can > then decide whether to enable it. This is a little odd though since it relates to the contents/size/consistency of the guest state directly. Dave > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK