On Thu, Aug 09, 2012 at 04:09:13PM -0300, Marcelo Tosatti wrote: > On Thu, Aug 09, 2012 at 05:01:34PM +0300, Avi Kivity wrote: > > On 08/09/2012 04:57 PM, Gerd Hoffmann wrote: > > > Hi, > > > > > >>> +u64 kvm_tsc_khz(void) > > >>> +{ > > >>> + u32 eax, ebx, ecx, edx, msr; > > >>> + struct pvclock_vcpu_time_info time; > > >>> + u32 addr = (u32)(&time); > > >>> + u64 khz; > > >>> + > > >>> + /* check presence and figure msr number */ > > >>> + cpuid(KVM_CPUID_FEATURES, &eax, &ebx, &ecx, &edx); > > >>> + if (eax & KVM_FEATURE_CLOCKSOURCE2) { > > >>> + msr = MSR_KVM_SYSTEM_TIME_NEW; > > >>> + } else if (eax & KVM_FEATURE_CLOCKSOURCE) { > > >>> + msr = MSR_KVM_SYSTEM_TIME; > > >>> + } else { > > >>> + return 0; > > >>> + } > > >>> + > > >>> + /* ask kvm hypervisor to fill struct */ > > >>> + memset(&time, 0, sizeof(time)); > > >>> + wrmsr(msr, addr | 1); > > >> > > >> How can this work? > > > > > > It did in my testing, although maybe by pure luck ... > > > > > >> There is a 64-byte alignment requirement. > > > > > > 64 bytes? Sure? The whole struct is only 32 bytes in size ... > > > > er, the documentation says 4 bytes (so stack alignment works). I > > distinctly remember having a large alignment requirement so we don't > > cross a page or slot boundary... something's wrong here. > > > > > > > > Easily fixable though, just need to grab some memory with memalign > > > instead of using the stack. > > > > > > > >>> + wrmsr(msr, 0); > > >>> + if (time.version < 2 || time.tsc_to_system_mul == 0) > > >>> + return 0; > > >>> + > > >>> + /* go figure tsc frequency */ > > >>> + khz = pvclock_tsc_khz(&time); > > >>> + dprintf(1, "Using kvmclock, msr 0x%x, tsc %d MHz\n", > > >>> + msr, (u32)khz / 1000); > > >>> + return khz; > > >> > > >> That's a meaningless number. You can be migrated to a cpu or a machine > > >> with very different tsc. > > > > > >> You want accurate time on kvm, don't use the tsc. > > > > > > seabios uses the tsc for timeout calculations only, so it doesn't need > > > to be 100% accurate. The order of magnitude should be correct though. > > > The Linux kernel uses the value for delay loops too, so using it for the > > > given purpose can't be *that* horrible after all ... > > > > > > It is certainly an improvement over the current code which tries to > > > calibrate the tsc and gets totally broken results in case the busy host > > > happens to schedule the guest in the middle of calibration. > > > > > > So what do you suggest? The options I see are: > > > > > > (1) Use this patch (with alignment issue fixed of course). > > > (2) Do a full kvmclock implementation. Feels a bit like overkill. > > > (3) SeaBIOS can fallback to the PIT for timing on machines which > > > have no TSC. We could do that too in case we detect kvm ... > > > > What sort of timeouts are these? If seconds, maybe the rtc would be best. > > I vote for 3 so nobody has to maintain kvmclock code in SeaBIOS and Gerd That or pm timer. > can fix the in-kernel PIT issues with GRUB (see Michaels message) while testing. > What message exactly? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html