[Bug 217423] TSC synchronization issue in VM restore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=217423

--- Comment #4 from robert.hoo.linux@xxxxxxxxx ---
On 5/18/2023 8:48 PM, bugzilla-daemon@xxxxxxxxxx wrote:
[...]
(Sorry for late response)
>> Can elaborate more on this hrtimer issue/code path?
> 
> Below are the steps in detail, I traced them via bpftrace, to simplify the
> analysis, the preemption timer on host is disabled, guest is running with
> TSC timer deadline mode.
> 
> TSC changes before save VM:
> 1 create VM/VCPU, guest TSC start from 0 (VCPU initial value)
>    host_tsc0 = 0 + offset0
> 2 pause VM after guest start finished (about 200ms)
>    host_tsc1 = guest_tsc1 + offset0
>    guest_tsc1_deadline = guest_tsc1 + expire1
> 3 save VM state
>    save guest_tsc1 by reading MSR_IA32_TSC
>    save guest_tsc1_deadline by reading MSR_IA32_TSC_DEADLINE
> 
> TSC changes in restore VM (to simplify the analysis, step 4
> and step 5 ignore the host TSC changes in restore process):
> 4 create VM/VCPU, guest TSC start from 0 (VCPU initial value)
>    host_tsc3 = 0 + offset1
> 5 restore VM state
>    set MSR_IA32_TSC by guest_tsc1
>    set MSR_IA32_TSC_DEADLINE by guest_tsc1_deadline
> 6 start VM
>    VCPU_RUN
> 
> In step 5 setting MSR_IA32_TSC, because the guest_tsc1 is within 1 second,
> KVM will take this update as TSC synchronize, then skip update offset1.
> This means the guest TSC is still at 0 (initialize value).

IIUC, here no matter synchronizing = true or false, offset will always be 
updated, i.e. __kvm_synchronize_tsc() will be called. But the offset value will 
differ.

I guess your environment is tsc_stable, then offset = kvm->arch.cur_tsc_offset, 
which is 0. That is to say, the elapsed time isn't counted in by the heuristics 
method in current code, that's the culprit.

static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
{
        ...
        offset = kvm_compute_l1_tsc_offset(vcpu, data);
        ...

        /*
         * For a reliable TSC, we can match TSC offsets, and for an unstable
         * TSC, we add elapsed time in this computation.  We could let the
         * compensation code attempt to catch up if we fall behind, but
         * it's better to try to match offsets from the beginning.
          */
        if (synchronizing &&
            vcpu->arch.virtual_tsc_khz == kvm->arch.last_tsc_khz) {
                if (!kvm_check_tsc_unstable()) {
                        offset = kvm->arch.cur_tsc_offset;
                } else {
                        u64 delta = nsec_to_cycles(vcpu, elapsed);
                        data += delta;
                        offset = kvm_compute_l1_tsc_offset(vcpu, data);
                }
                matched = true;
        }

        __kvm_synchronize_tsc(vcpu, offset, data, ns, matched);
        raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
}


>> An alternative, I think, is to bypass this directly write IA32_MSR_TSC way
>> to set/sync TSC offsets, but follow new approach introduced in your VMM by
>>
>> commit 828ca89628bfcb1b8f27535025f69dd00eb55207
>> Author: Oliver Upton <oliver.upton@xxxxxxxxx>
>> Date:   Thu Sep 16 18:15:38 2021 +0000
>>
>>       KVM: x86: Expose TSC offset controls to userspace
>>
>> ...
>>
>> Documentation/virt/kvm/devices/vcpu.rst:
>>
>> 4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET
>>
>> :Parameters: 64-bit unsigned TSC offset
>>
>> ...
>>
>> Specifies the guest's TSC offset relative to the host's TSC. The guest's
>> TSC is then derived by the following equation:
>>
>>     guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET
>>
>> The following describes a possible algorithm to use for this purpose
>> ...
> 
> "TSC counts the time during which the VM was paused.", This new feature works
> for live migration. But if we save/restore VM with snapshot, the TSC should
> be
> paused either?
> 
Not sure what's host's TSC situation when host is, say, suspended/hibernated.
VM 
Save/Restore can refer to that.
But, the key point of this new approach is to use OFFSET rather than direct TSC 
value, this is like x86 TSC_ADJUST was introduced, and is preferred.
Via this new interface,
"... Ensure that the KVM_CLOCK_REALTIME flag is set in the provided structure.
KVM will advance the VM's kvmclock to account for elapsed time since recording 
the clock values.", therefore I think it can solve your problem, rather than 
modify the ancient and heuristics code at high risk.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux