On 16/04/19 22:32, Sean Christopherson wrote: > The recent change to automatically calculate lapic_timer_advance_ns > introduced a handful of gnarly bugs, and also exposed a latent bug by > virtue of the advancement logic being enabled by default. Inspection > and debug revealed several other opportunities for minor improvements. > > The primary issue is that the auto-tuning of lapic_timer_advance_ns is > completely unbounded, e.g. there's nothing in the logic that prevents the > advancement from creeping up to hundreds of milliseconds. Adjusting the > timers by large amounts creates major discrepancies in the guest, e.g. a > timer that was configured to fire after multiple milliseconds may arrive > before the guest executes a single instruction. While technically correct > from a time perspective, it breaks a reasonable assumption from the guest > that it can execute some number of instructions between timer events. > > The other major issue is that the advancement is global, while TSC scaling > is done on a per-vCPU basic. Adjusting the advancement at runtime > exacerbates this as there is no protection against multiple vCPUs and/or > multiple VMs concurrently modifying the advancement value, e.g. it can > effectively become corrupted or never stabilize due to getting bounced all > over tarnation. > > As for the latent bug, when timer advancement was applied to the hv_timer, > i.e. the VMX preemption timer, the logic to trigger wait_for_lapic_timer() > was not updated. As a result, a timer interrupt emulated via the hv_timer > can easily arrive too early from a *time* perspective, as opposed to > simply arriving early from a "number of instructions executed" perspective. In a surprising twist of events, I've queued the last 5 patches. For the first four, I still plan to get them in 5.1 but I had some comments. They will have to wait for next week since I'll have a (short) Easter vacation, but there should be plenty of time. Paolo