On Wed, Aug 29, 2012 at 05:11:12PM +0300, Michael S. Tsirkin wrote: > On Wed, Aug 29, 2012 at 10:49:04AM -0300, Eduardo Habkost wrote: > > > Normally CPUID will tell you if such important MSR is available. > > > So we can check that at destination. > > > > How can qemu check it, if when the qemu code was written when the MSR > > didn't even exist yet? > > > > (You could add an interface to check for that, but there's no KVM > > ioctl() to tell qemu "given these CPUID bits, can I safely drop this MSR > > that I don't even know about?") > > > > So this is what I suggest exactly. Add a new ioctl like that. > > > > > > > > > > > > > > > > > > > > > > > > > On the other hand, a mode of operation that doesn't require updating > > > > > > > > > QEMU every time there's a new bit of guest-visible state to be migrated > > > > > > > > > would be nice (just like the "-cpu host" mode, that doesn't require > > > > > > > > > updating QEMU for every new CPU feature, is nice for some use cases). I > > > > > > > > > just don't know how to make work with the current migration protocol. > > > > > > > > > > > > > > > > > > > > > > > > > I don't understand. What is the problem with the proposal? > > > > > > > > What will not work with our protocol? > > > > > > > > Can you give an example please? > > > > > > > > > > > > Yes: > > > > > > > > > > > > Suppose kernel 3.7 introduces KVM_FOO_MSR and CPUID_KVM_FOO. > > > > > > > > > > > > Also, suppose QEMU 1.2 doesn't know anything about KVM_FOO, because it > > > > > > was release before this feature was introduced. > > > > > > > > > > > > > > > > > > Then you run "qemu-1.2 -M pc-1.2" on a 3.7 host kernel. qemu-1.2 can do > > > > > > two things here: > > > > > > > > > > > > 1) Not enable CPUID_KVM_FOO > > > > > > > > > > > > In this case, the guest should not use KVM_FOO_MSR and the MSR does not > > > > > > need to be migrated (the guest may try to use it, but the behavior when > > > > > > trying to use it is undefined). Sending the MSR value when migrating > > > > > > would be useless. > > > > > > > > > > > > > > > > > > 2) Enable CPUID_KVM_FOO. > > > > > > > > > > > > In this case, the guest may try to use the feature and write something > > > > > > into KVM_FOO_MSR. Sending the MSR value when migrating is absolutely > > > > > > necessary > > > > > > > > > > > > --- > > > > > > > > > > > > Now assume you run "qemu-1.2 -M pc-1.2" in the destination host, running > > > > > > the 3.6 kernel (without KVM_FOO). > > > > > > > > > > > > Then qemu-1.2 receives the KVM_FOO_MSR data in the migration stream. In > > > > > > this case, qemu-1.2 simply can't decide if it's safe to drop the data > > > > > > (and not tell KVM about it), or not. > > > > > > If we simply send every MSR reported by the kernel, both the origin and > > > > > > destination qemu-1.2 processes can't ever know if the MSR value is > > > > > > important or not, because it doesn't know if it's part of the machine > > > > > > state that has to be kept consistent. > > > > > > We could introduce a mode of operation where _every_ MSR reported by KVM > > > > > > is important and sent by the origin (and also where every MSR must be > > > > > > handled by the destination, otherwise migration would fail). But this > > > > > > new mode would break migration compatibility between two hosts running > > > > > > the same machine-type, only because they are running different kernel > > > > > > versions. But it may be useful for some use cases, so maybe it's > > > > > > appropriate for a future "pc-next" machine-type (and for "-cpu host"), > > > > > > but not for "pc-<version>". > > > > > > > > > > > > > > > > In this example, we should migrate CPUID (don't we?). > > > > > Destination should validate that CPUID supplied by source > > > > > matches what it can support (doesn't it?). > > > > > > > > > > If we do and it does not, it's an unrelated bug: > > > > > CPUID changing under guest's feet. > > > > > > > > CPUID changing under guest's feet is another problem, that we also have > > > > to solve. > > > > But we also have the problem of migration compatibility > > > > between different host kernels. > > > > > > So here is the solution for both: on destination pass CPUID to kvm and > > > it should come back unchanged. If it changed you fail migration. > > > > This doesn't solve the problem of having predictable migration > > compatibility for "-M pc-<old-version>". > > > > The whole point of machine-types is to expose the "same machine" to the > > guest, even if you change the hardware or host kernel. "qemu -M pc-1.x" > > must expose the same machine configuration to the guest, it doesn 't > > matter what's the host kernel version. > > I'd tend to disagree. The point is to make migration work > and avoid things like windows re-activation trigger. > Let's not be purists - many internal changes in qemu > introduce subtle guest visible changes, if even in timing. Yes, but in this case silently adding MSRs and/or new feature bits after a kernel upgrade is not just a "subtle" change, it's a change that breaks migration. > > > > > > > > > > > > Note that I am not saying that migrating all MSRs is wrong. It _is_ > > > > good, as long as: > > > > - The destination never ignores any of the incoming MSR values. > > > > > > What I am saying for MSRs added in last 2 years it is OK to ignore > > > because CPUID check will tell you if it is supported > > > and fail migration. > > > > Existing MSRs are easy to make work. The problem is about MSRs added to > > the "msrs_to_save" list in the future. > > > > Also, the problem is not about being "safe" to ignore the MSR values, > > it's about being "correct" (part of the expected behavior of the virtual > > machine). The fact that most guests doen't crash when the virtual > > machine doesn't behave as it should doesn't mean we should do it. > > > > Either the MSR is part of the machine state (and relevant to the guest), > > or not. If it is relevant, it must be _always_ migrated and never > > dropped by the destination. If it is not, it's useless to migrate it. > > > > Yes. But it is better to keep all knowledge which is which > in one place which is in kvm. True, but doing this may need additional interfaces, not provided by KVM yet (see below). > > > > > > > > - We don't do that by default on "pc-<version>", or we defeat the > > > > purpose of machine-types. > > > > > > > > I'll try to enumerate the problems I am trying to address (that are > > > > related, but are separate problems): > > > > > > > > - MSR not being migrated when it should: > > > > - Possible solution: migrate all MSRs even if qemu doesn't know what > > > > they are. > > > > - Constraint: migration destination must _never_ ignore any incoming > > > > MSR value, because it can't decide if it is important to the guest > > > > or not (with the current KVM interfaces). > > > > - Problem with this solution: if we do that by default on > > > > "pc-<version>", we break migration compatibility between hosts > > > > with different kernel versions. > > > > > > Solution: add vcpu ioctl that tells you which MSRs to migrate > > > (on source), depending on CPUID. > > > > This may be a solution for old-kernel => new-kernel migration, yes. But > > this still doesn't solve the migration compatibility problem for > > new-kernel => old-kernel migration (see below). > > > > > > > > > > > - Changing CPUID bits under guest's feet. > > > > - Proposed solution: migrating CPUID bits, refuse migration if > > > > destination doesn't support the same bits. > > > > - It solves the compatibility problem for migration to a newer > > > > kernel, but not to an older kernel. It helps to solve part of > > > > the problem, but not all. > > > > > > How does not it save all of the problem? If destination kernel > > > can support cpuid, then we are fine - it is new enough. > > > > See below. The problem is being able to migrate to an older host. > > That's the whole point of machine-types! > > > > > > > > > - Alternative solution: simply make the resulting CPUID bits not be a > > > > function of the host kernel capabilities, but only of the qemu > > > > configuration (except on "-cpu host" and "-M pc-next"). > > > > > > This perpetuates existing duplication of code between > > > kvm and qemu. We are better off with logic in 1 place. > > > > Yes, it does, and I would love to avoid having the list inside QEMU, > > too. But we can't avoid that because each machine-type defines a set of > > available features/MSRs, so we have to have a machine-type => > > list-of-features list somewhere, unfortunately. > > > > Yes. But let us have machine type->cpuid list in qemu. > kvm will have the cpuid->MSR logic. That would work, too. > > > > > > > > > - Migration compatibility/predictability: > > > > - See my example above: feature introduced in a newer kernel, > > > > migration to an older kernel. > > > > > > If it is enabled then migration fails. > > > > > > > - The only way I see to guarantee this is to never enable unknown > > > > CPUID bits or MSRs. People who don't care about predictable > > > > migration compatibility can use "-M pc-next", then. > > > > > > > > > > Guarantee what? > > > > Guarantee that "-M pc-1.1" machines can be migrated to any host that is > > already capable of running "-M pc-1.1". > > > > I can't change the past. I am suggesting forward compatible > approach so we'll be able to guarantee this for > -M pc-1.2 and on. pc-1.1 is already broken, but at least pc-1.3 and later should work this way (so, for example, "qemu-1.5 -M pc-1.3" can be guaranteed to be migratable to any host that can run "qemu -M pc-1.3"). > > > > Just check dst can support all msrs and cpuid bits of src. > > > Way to check is to ask kvm :) Not to add logic in qemu. > > > > Checking and making migration fail when it has to fail is not the > > problem. The problem is that now "qemu -M pc-1.x" will result in a > > different machine, depending on the host kernel version. This causes two > > problems: > > > > - Now you don't know if your existing machine can be migrated to a > > host running an older kernel (because now migration can fail even when > > you are using the same machine-type on both sides). > > - Different VMs using the same machine-type will get different machines > > (with different sets of features), because they are running on > > different kernel versions. > > > > This may be acceptable for "pc-next", but not for "pc-<version>". > > So you can whitelist CPUID bits. But leave MSRs alone, it is nasty > enough with CPUID. That may be a solution, yes, as long as we have an interface to map CPUID bits to a list of "relevant MSRs that have to be migrated", like you suggests (that we don't have yet). So, in short: - A machine-type must define a list of machine "features", many of them exposed through CPUID bits - The exposure of a feature (in this case, always throught a CPUID bit?) may make migration of a MSR necessary. - We need the CPUID => MSRs mapping somewhere. The "machine-type => features" mapping has to live in QEMU, but the "features => MSRs" mapping may live in the kernel, if you want to. But today we don't have an interface in KVM for that. -- Eduardo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html