* Paolo Bonzini (pbonzini@xxxxxxxxxx) wrote: > On 02/11/2018 04:46, Liran Alon wrote: > >> On Thu, Nov1, 2018 at 09:45 AM, Jim Mattson <jmattson@xxxxxxxxxx> wrote: > > > >>> On Thu, Nov 1, 2018 at 8:56 AM, Dr. David Alan Gilbert <dgilbert@xxxxxxxxxx> wrote: > > > >>> So if I have matching host kernels it should always work? > >>> What happens if I upgrade the source kernel to increase it's maximum > >>> nested size, can I force it to keep things small for some VMs? > > > >> Any change to the format of the nested state should be gated by a > >> KVM_CAP set by userspace. (Unlike, say, how the > >> KVM_VCPUEVENT_VALID_SMM flag was added to the saved VCPU events state > >> in commit f077825a8758d.) KVM has traditionally been quite bad about > >> maintaining backwards compatibility, but I hope the community is more > >> cognizant of the issues now. > > > >> As a cloud provider, one would only enable the new capability from > >> userspace once all hosts in the pool have a kernel that supports it. > >> During the transition, the capability would not be enabled on the > >> hosts with a new kernel, and these hosts would continue to provide > >> nested state that could be consumed by hosts running the older kernel. > > > > Hmm this makes sense. > > > > This means though that the patch I have submitted here isn't good enough. > > My patch currently assumes that when it attempts to get nested state from KVM, > > QEMU should always set nested_state->size to max size supported by KVM as received > > from kvm_check_extension(s, KVM_CAP_NESTED_STATE); > > (See kvm_get_nested_state() introduced on my patch). > > This indeed won't allow migration from host with new KVM to host with old KVM if > > nested_state size was enlarged between these KVM versions. > > Which is obviously an issue. > > Actually I think this is okay, because unlike the "new" capability was > enabled, KVM would always reduce nested_state->size to a value that is > compatible with current kernels. > > > But on second thought, I'm not sure that this is the right approach as-well. > > We don't really want the used version of nested_state to be determined on kvm_init(). > > * On source QEMU, we actually want to determine it when preparing for migration based > > on to the support given by our destination host. If it's an old host, we would like to > > save an old version nested_state and if it's a new host, we will like to save our newest > > supported nested_state. > > No, that's wrong because it would lead to losing state. If the source > QEMU supports more state than the destination QEMU, and the current VM > state needs to transmit it for migration to be _correct_, then migration > to that destination QEMU must fail. > > In particular, enabling the new KVM capability needs to be gated by a > new machine type and/or -cpu flag, if migration compatibility is needed. > (In particular, this is one reason why I haven't considered this series > for 3.1. Right now, migration of nested hypervisors is completely > busted but if we make it "almost" work, pre-3.1 machine types would not > ever be able to add support for KVM_CAP_EXCEPTION_PAYLOAD. Therefore, > it's better for users if we wait for one release more, and add support > for KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD at the same time). > > Personally, I would like to say that, starting from QEMU 3.2, enabling > nested VMX requires a 4.20 kernel. It's a bit bold, but I think it's a > good way to keep some sanity. Any opinions on that? That seems a bit mean; there's a lot of people already using nested. Dave > Paolo > > > Therefore, I don't think that we want this versioning to be based on KVM_CAP at all. > > It seems that we would want the process to behave as follows: > > 1) Mgmt-layer at dest queries dest host max supported nested_state size. > > (Which should be returned from kvm_check_extension(KVM_CAP_NESTED_STATE)) > > 2) Mgmt-layer at source initiate migration to dest with requesting QEMU to send nested_state > > matching dest max supported nested_state size. > > When saving nested state using KVM_GET_NESTED_STATE IOCTL, QEMU will specify in nested_state->size > > the *requested* size to be saved and KVM should be able to save only the information which matches > > the version that worked with that size. > > 3) After some sanity checks on received migration stream, dest host use KVM_SET_NESTED_STATE IOCTL. > > This IOCTL should deduce which information it should deploy based on given nested_state->size. > > > > This also makes me wonder if it's not just nicer to use nested_state->flags to specify which > > information is actually present on nested_state instead of managing versioning with nested_state->size. > > > > What are your opinions on this? > > > > -Liran > > > -- Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK