On Tue, Dec 8, 2020 at 8:26 AM Maxim Levitsky <mlevitsk@xxxxxxxxxx> wrote: > > On Tue, 2020-12-08 at 17:02 +0100, Thomas Gleixner wrote: > > On Tue, Dec 08 2020 at 16:50, Maxim Levitsky wrote: > > > On Mon, 2020-12-07 at 20:29 -0300, Marcelo Tosatti wrote: > > > > > +This ioctl allows to reconstruct the guest's IA32_TSC and TSC_ADJUST value > > > > > +from the state obtained in the past by KVM_GET_TSC_STATE on the same vCPU. > > > > > + > > > > > +If 'KVM_TSC_STATE_TIMESTAMP_VALID' is set in flags, > > > > > +KVM will adjust the guest TSC value by the time that passed since the moment > > > > > +CLOCK_REALTIME timestamp was saved in the struct and current value of > > > > > +CLOCK_REALTIME, and set the guest's TSC to the new value. > > > > > > > > This introduces the wraparound bug in Linux timekeeping, doesnt it? > > > > Which bug? > > > > > It does. > > > Could you prepare a reproducer for this bug so I get a better idea about > > > what are you talking about? > > > > > > I assume you need very long (like days worth) jump to trigger this bug > > > and for such case we can either work around it in qemu / kernel > > > or fix it in the guest kernel and I strongly prefer the latter. > > > > > > Thomas, what do you think about it? > > > > For one I have no idea which bug you are talking about and if the bug is > > caused by the VMM then why would you "fix" it in the guest kernel. > > The "bug" is that if VMM moves a hardware time counter (tsc or anything else) > forward by large enough value in one go, > then the guest kernel will supposingly have an overflow in the time code. > I don't consider this to be a buggy VMM behavior, but rather a kernel > bug that should be fixed (if this bug actually exists) > > Purely in theory this can even happen on real hardware if for example SMM handler > blocks a CPU from running for a long duration, or hardware debugging > interface does, or some other hardware transparent sleep mechanism kicks in > and blocks a CPU from running. > (We do handle this gracefully for S3/S4) IIRC we introduced mul_u64_u32_shift() for roughly this reason,but we don't seem to be using it in the relevant code paths. We should be able to use the same basic math with wider intermediates to allow very large intervals between updates.