On Thu, Jan 05, 2017 at 10:19:50AM -0200, Eduardo Habkost wrote: > On Thu, Jan 05, 2017 at 08:48:32AM -0200, Marcelo Tosatti wrote: > > On Wed, Jan 04, 2017 at 11:36:31PM -0200, Eduardo Habkost wrote: > > > On Wed, Jan 04, 2017 at 08:26:27PM -0200, Marcelo Tosatti wrote: > > > > On Wed, Jan 04, 2017 at 05:59:17PM -0200, Eduardo Habkost wrote: > > > > > On Wed, Jan 04, 2017 at 11:39:16AM -0200, Eduardo Habkost wrote: > > > > > > On Wed, Jan 04, 2017 at 09:56:56AM -0200, Marcelo Tosatti wrote: > > > > > > > On Tue, Dec 27, 2016 at 05:21:20PM -0200, Eduardo Habkost wrote: > > > > > > > > Instead of blocking migration on the source when invtsc is > > > > > > > > enabled, rely on the migration destination to ensure there's no > > > > > > > > TSC frequency mismatch. > > > > > > > > > > > > > > > > We can't allow migration unconditionally because we don't know if > > > > > > > > the destination is a QEMU version that is really going to ensure > > > > > > > > there's no TSC frequency mismatch. To ensure we are migrating to > > > > > > > > a destination that won't ignore SET_TSC_KHZ errors, allow invtsc > > > > > > > > migration only on pc-*-2.9 and newer. > > > > > > > > > > > > > > > > Signed-off-by: Eduardo Habkost <ehabkost@xxxxxxxxxx> > > > > > [...] > > > > > > > > @@ -2655,12 +2656,14 @@ int kvm_arch_put_registers(CPUState *cpu, int level) > > > > > > > > } > > > > > > > > > > > > > > > > if (level == KVM_PUT_FULL_STATE) { > > > > > > > > - /* We don't check for kvm_arch_set_tsc_khz() errors here, > > > > > > > > - * because TSC frequency mismatch shouldn't abort migration, > > > > > > > > - * unless the user explicitly asked for a more strict TSC > > > > > > > > - * setting (e.g. using an explicit "tsc-freq" option). > > > > > > > > + /* Migration TSC frequency mismatch is fatal only if we are > > > > > > > > + * actually reporting Invariant TSC to the guest. > > > > > > > > */ > > > > > > > > - kvm_arch_set_tsc_khz(cpu); > > > > > > > > + ret = kvm_arch_set_tsc_khz(cpu); > > > > > > > > + if ((x86_cpu->env.features[FEAT_8000_0007_EDX] & CPUID_APM_INVTSC) && > > > > > > > > + ret < 0) { > > > > > > > > + return ret; > > > > > > > > + } > > > > > > > > } > > > > > > > > > > > > > > Will the guest continue in the source in this case? > > > > > > > > > > > > > > I think this is past the point where migration has been declared > > > > > > > successful. > > > > > > > > > > > > > > Otherwise looks good. > > > > > > > > > > > > Good point. I will make additional tests and see if there's some > > > > > > other place where the kvm_arch_set_tsc_khz() call can be moved > > > > > > to. > > > > > > > > > > So, if we solve this and do something on (for example) post_load, > > > > > we still have a problem: device state is migrated after RAM. This > > > > > means QEMU will check for TSC scaling and abort migration very > > > > > late. > > > > > > > > > > We could solve that by manually registering a SaveVMHandler that > > > > > will send the TSC frequency on save_live_setup, so migration is > > > > > aborted earlier. > > > > > > > > > > But: this sounds like just a complex hack to work around the real > > > > > problems: > > > > > > > > > > 1) TSC frequency is guest-visible, and anything that affects > > > > > guest ABI should depend on the VM configuration, not on host > > > > > capabilities; > > > > > > > > Well not really: the TSC frequency where the guest starts > > > > is the frequency the guest software expects. > > > > So it does depend on host capabilities. > > > > > > Could you explain where this expectation comes from? I can see > > > multiple cases where choosing the TSC frequency where the VM > > > starts is not the best option. > > > > 1. Boot guest. > > 2. Calibrate TSC. > > 3. Use delay() with TSC calibration above, or > > use TSC to measure the passage of time (TSC clock > > in userspace). > > If TSC scaling is available, using a different frequency should > be safe, shouldn't it? Otherwise, migrating with TSC scaling > wouldn't be safe either. Yes, but if you don't have TSC scaling, you have to boot with the host frequency. > Anyway: I don't disagree the starting host frequency is a good > default. It is. But I don't think it's the best option on all > cases. Agree. > > > > > I am considering two possible scenarios below. You probably have > > > another scenario in mind, so it would be useful if you could > > > describe it so we can consider possible solutions. > > > > > > > > > Scenario 1: > > > > > > You have two hosts: A and B, both with TSC scaling. They have > > > different frequencies. The VM can be started on either one of > > > them, and can be migrated to either one depending on the policy > > > of management software. > > > > > > Maybe B is a backup host, the VM is expected to run most times on > > > host A, and we want to use the TSC frequency from host A every > > > time. Maybe it's the opposite and we want to use the frequency of > > > B. Maybe we want to use the lowest frequency of both, maybe the > > > highest one. We (QEMU developers) can recommend policy to libvirt > > > developers, but a given QEMU process doesn't have information to > > > decide what's best. > > > > I can't see any practical scenario here, you will always want > > to start with TSC frequency of the host where the VM was started. > > > > If i am mistaken, please describe a practical case. > > > > (If a practical scenario comes up, and there is a use-case > > for setting the TSC frequency on startup, lets say > > a Windows VM which fails to boot if the TSC frequency > > is too high, then it should be supported... But the > > only known scenario to me, applying to 99.999% of cases, > > is to expose the TSC frequency where the guest booted at). > > I don't have any specific case: my point is that I can't tell > what's the best frequency if I don't know where the hosts are > expected to be migrated to. > > You claim that using the starting host frequency is the best > option on the vast majority of cases. Maybe it's true, and it > would be a good default. The only problem is that this default > affects migratability: > > > > > > Scenario 2: > > > > > > Host A has TSC scaling, host B doesn't have TSC scaling. We want > > > to be able to start the VM on host A, and migrate to B. In this > > > case, the only possible solution is to use B's frequency when > > > starting the VM. The QEMU process doesn't have enough information > > > to make that decision. > > > > That is a good point. But again, its a special case and > > should be supported by -cpu xxx,tsc-frequency=zzzz. > > > > However, for the vast majority of 99.999% cases, the issue > > can be handled entirely in QEMU, without libvirt involvement, > > and without adding extra steps to the management software. > > I agree it should cover most cases. The only problem here is that > it can break migration in unexpected ways. > > Then my point is: assuming that libvirt will prefer to require > explicit TSC frequency configuration to enable invtsc migration > (instead of getting unpredictable migration compatibility), is > the added complexity to migration code worth the effort, if > choosing an explicit frequency is safer and more predictable? I > believe this is where we disagree. > > > > > > > > 2) Setting TSC frequency depending on the host will make > > > > > migratability unpredictable for management software: the same > > > > > VM config could be migratable to host A when started on host > > > > > B, but not migratable to host A when started on host C. > > > > > > > > Well, just check the frequency. > > > > > > What do you mean by "just check the frequency"? What exactly > > > should management software do? > > > > VM booted on host-A can be migrated to host-B if TSC > > frequency matches. > > Except when TSC scaling is available. Then you may or may not > migrate from A to B, and you don't know that unless you actually > try to migrate. > > > > > > Whatever management software do, if you just use the source host > > > frequency you can get into a situation where you can start a VM > > > on host A and migrate it to B, but can't start the VM on host B > > > and migrate it to A. This would be confusing for users, and > > > probably break assumptions of existing management software. > > > > Well this is a side effect of invariant TSC and migration. > > > > Look, i agree with all your points, but my point is this: i personally > > prefer to handle the 99.999% case, which is the TSC frequency exposed is the one > > from the host where the guest booted, in QEMU entirely (for example, to > > make life easier for people who run qemu manually such as myself). > > Right. I don't mind not covering 100% of cases. But I worry if > the cases not covered by us behave unexpectedly and > unpredictably. If we caused a failure every time an > unsafe/unsupported config was used, it would be OK. But making > the ability to migrate unpredictable is a more serious issue IMO. > > I believe we both agree about how the final version should > behave, but disagree about what is more important in the first > version: > > * My proposal for the first version means: > * People would have to configure TSC frequency manually if > they want invtsc + migration (until we also provide a > mechanism to query the host TSC frequency so > management/scripts could choose it as default); > * Adding more code to libvirt. > * Your proposal for the first version means: > * The ability to migrate won't be predictable by libvirt; > * Extra complexity on the migration code to ensure > we abort migration on mismatch. > > We seem to be weighting those issues differently. To me, having > predictability on migration ability since the first version is > more important to me than making configuration easier (on the > first version). > > > > > > > > I suggest we allow migration with invtsc if and only if > > > > > tsc-frequency is set explicitly by management software. In other > > > > > words, apply only patches 1/4 and 2/4 from this series. After > > > > > that, we will need libvirt support for configuring tsc-frequency. > > > > > > > > I don't like that for the following reasons: > > > > > > > > * It moves low level complexity from QEMU/KVM to libvirt > > > > (libvirt has to call KVM_GET_TSC_KHZ from a vcpu thread). > > > > > > It doesn't. It could ask QEMU for that information (the new > > > query-cpu-model-expansion QMP command would allow that). > > > > > > Or, alternatively, it could just let the user choose the > > > frequency. > > > > Again, you want to expose the host where the VM booted in most > > cases (except the ones you list above). > > > > > It's probably acceptable for many use cases where > > > invtsc+migration is important. > > > > Ok, thats better. Can you add a patch to your series with the steps > > as how mgmt software should proceed ? > > > > > > * It makes it difficult to run QEMU manually (i use QEMU > > > > manually all the time). > > > > > > I believe the set of people that: 1) need invtsc; 2) need > > > migration to work; and 3) run QEMU manually will be able to add > > > "tsc-frequency=XXX" to the command-line easily. :) > > > > Ok, so migration is only allowed if tsc-frequency= is specified. > > > > > > * It requires patching libvirt. > > > > > > > > In my experience things work better when the functionality is > > > > not split across libvirt/qemu. > > > > > > In my experience things break when management software is the > > > only component able to make a decision but we don't provide > > > mechanisms to let management software make that decision. > > > > > > The TSC frequency affects migratability to hosts. Choose the > > > wrong frequency, and you might not be able to migrate. Only > > > management software knows to which hosts the VM could be migrated > > > in the future, and which TSC frequency would allow that. > > > > True. > > > > > > Can't this be fixed in QEMU? Just check that destination host supports > > > > TSC scaling before migration (or that KVM_GET_TSC_KHZ return value > > > > matches on source and destination). > > > > > > > > > > This solves only one use case: where you want to expose the > > > starting host TSC frequency to the guest. It doesn't cover any > > > scenario where the TSC frequency of the starting host isn't the > > > best one. (See the two scenarios I described above) > > > > True. > > > > > You seem to have a specific use case in mind. If you describe it, > > > we could decide if it's worth the extra migration code complexity > > > to deal with invtsc migration without explicit tsc-frequency. By > > > now, I am not convinced yet. > > > > I don't have any specific use case in mind. I'm just trying to > > avoid moving complexity to libvirt which in my experience is a > > the best thing to do. > > I think both our proposals make things harder for libvirt in > different ways. I just want to make the complexity explicit for > libvirt, instead of giving them the illusion of safety (making > migration break in ways libvirt can't detect). > > Anyway, your points are still valid and it would be still useful > to do what you propose. I will give it a try on a new version of > patch 4/4 that can abort migration earlier. But note that I > expect some pushback from other maintainers because of the > complexity of the code. If that happens, be aware that I won't be > able to justify the added complexity because I would still think > it's not worth it. > > > > > > Maybe your use case just needs a explicit "invtsc-migration=on" > > > command-line flag without a mechanism to abort migration on > > > mismatch? I can't tell. > > > > Again, there is no special case. > > > > > Note that even if we follow your suggestion and implement an > > > alternative version of patch 4/4 to cover your use case, I will > > > strongly recommend libvirt developers to support configuring TSC > > > frequency if they want to support invtsc + migration without > > > surprising/unpredictable restrictions on migratability. > > > > Well, alright. If you make the TSC frequency of the host > > available to mgmt software as you describe, and write the steps mgmt > > software should take, i'm good. > > I plan to. The problem is that the mechanism to query the host > frequency may be unavailable in the first version. Well just export KVM_GET_TSC_KHZ in a QMP command right? Its pretty easy. Let me know if you need any help coding or testing. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html