On Tue, 2023-11-07 at 15:07 -0800, Dongli Zhang wrote: > Thank you very much for the detailed explanation. > > I agree it is important to resolve the "now" problem. I guess the KVM lapic > deadline timer has the "now" problem as well. I think so. And quite gratuitously so, since it just does: now = ktime_get(); guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); Couldn't that trivially be changed to kvm_get_monotonic_and_clockread()? Thankfully, it's defined in the time domain of the guest TSC, not the kvmclock, so it doesn't suffer the same drift issue as the Xen timer. > I just notice my question missed a key prerequisite: > > Would you mind helping explain the time domain of the "oneshot.timeout_abs_ns"? > > While it is the absolute nanosecond value at the VM side, on which time domain > it is based? It's the kvmclock. Xen offers as Xen PV clock to its guests using *precisely* the same pvclock structure as KVM does. > 1. Is oneshot.timeout_abs_ns based on the xen pvclock (freq=NSEC_PER_SEC)? > > 2. Is oneshot.timeout_abs_ns based on tsc from VM side? > > 3. Is oneshot.timeout_abs_ns based on monotonic/raw clock at VM side? > > 4. Or it is based on wallclock? > > I think the OS does not have a concept of nanoseconds. It is derived from a > clocksource. It's the kvmclock. The guest derives it from the guest TSC using the pvclock information (mul/shift/offset) that KVM provides to the guest. The kvm_setup_guest_pvclock() function is potentially called *three* times from kvm_guest_time_update(). Once for the KVM pv time MSR, once for the pvclock structure in the Xen vcpu_info, and finally for the pvclock structure which Xen makes available to userspace for vDSO timekeeping. > If it is based on pvclock, is it based on the pvclock from a specific vCPU, as > both pvclock and timer are per-vCPU. Yes, it is per-vCPU. Although in the sane case the TSCs on all vCPUs will match and the mul/shift/offset provided by KVM won't actually differ. Even in the insane case where guest TSCs are out of sync, surely the pvclock information will differ only in order to ensure that the *result* in nanoseconds does not? I conveniently ducked this question in my patch by only supporting the CONSTANT_TSC case, and not the case where we happen to know the (potentially different) TSC frequencies on all the different pCPUs and vCPUs. > > E.g., according to the KVM lapic deadline timer, all values are based on (1) the > tsc value, (2)on the current vCPU. > > > 1949 static void start_sw_tscdeadline(struct kvm_lapic *apic) > 1950 { > 1951 struct kvm_timer *ktimer = &apic->lapic_timer; > 1952 u64 guest_tsc, tscdeadline = ktimer->tscdeadline; > 1953 u64 ns = 0; > 1954 ktime_t expire; > 1955 struct kvm_vcpu *vcpu = apic->vcpu; > 1956 unsigned long this_tsc_khz = vcpu->arch.virtual_tsc_khz; > 1957 unsigned long flags; > 1958 ktime_t now; > 1959 > 1960 if (unlikely(!tscdeadline || !this_tsc_khz)) > 1961 return; > 1962 > 1963 local_irq_save(flags); > 1964 > 1965 now = ktime_get(); > 1966 guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); > 1967 > 1968 ns = (tscdeadline - guest_tsc) * 1000000ULL; > 1969 do_div(ns, this_tsc_khz); > > > Sorry if I make the question very confusing. The core question is: where and > from which clocksource the abs nanosecond value is from? What will happen if the > Xen VM uses HPET as clocksource, while xen timer as clock event? If the guest uses HPET as clocksource and Xen timer as clockevents, then keeping itself in sync is the *guest's* problem. The Xen timer is defined in terms of nanoseconds since guest start, as provided in the pvclock information described above. Hope that helps!
Attachment:
smime.p7s
Description: S/MIME cryptographic signature