On Wed, Aug 21, 2019 at 10:17 AM Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > On Tue, Aug 20, 2019 at 12:34:20AM -0700, Matt Delco wrote: > > On Mon, Aug 19, 2019 at 10:09 PM Wanpeng Li <kernellwp@xxxxxxxxx> wrote: > > > > > > On Tue, 20 Aug 2019 at 12:10, Nadav Amit <nadav.amit@xxxxxxxxx> wrote: > > > > These tests pass on bare-metal. > > > > > > Good to know this. In addition, in linux apic driver, during mode > > > switch __setup_APIC_LVTT() always sets lapic_timer_period(number of > > > clock cycles per jiffy)/APIC_DIVISOR to APIC_TMICT which can avoid the > > > issue Matt report. So is it because there is no such stuff in windows > > > or the windows version which Matt testing is too old? > > > > I'm using Windows 10 (May 2019). Multimedia apps on Windows tend to > > request higher frequency clocks, and this in turn can affect how the > > kernel configures HW timers. I may need to examine how Windows > > typically interacts with the APIC timer and see if/how this changes > > when Skype is used. The frequent timer mode changes are not something > > I'd expect a reasonably behaved kernel to do. > > Have you tried analyzing the guest code? If we're lucky, doing so might > provide insight into what's going awry. > > E.g.: > > Are the LVTT/TMICT writes are coming from a single blob/sequence of code > in the guest? > > Is the unpaired LVTT coming from the same code sequence or is it a new > rip entirely? > > Can you dump the relevant asm code sequences? I have changed gears to do runtime behavioral analysis, given the reports that the code change I proposed would deviate from hardware. The time between writes for TMICT-then-LVTT is typically quite small, and much smaller than the average for LVTT-then-TMICT. On the lead up to where time stops there's alternating writes to TMICT and LVTT, where each write to LVTT alternates between setting periodic vs. one-shot. The final write to LVTT (which sets periodic) comes more than 1.5 ms after the prior TMICT (which is about 100x the typical delay), which might mean the kernel opted to not write to TMICT but did on the next clock tick. The host kernel & kvm I've been testing with seems to be firing the timer callbacks sooner than requested, so if the guest kernel has optimizations based on whether it thinks there's time left on the APIC timer then this might be causing problems. I'm going to try to pull in some of the newer kvm changes that appear to compensate for the early delivery and see if that also makes the time hang symptom disappear (if not then I may start to examine things from the guest side). Thanks.