Re: [PATCH v10 3/7] arm64: hyperv: Add Hyper-V clocksource/clockevent support

Marc Zyngier <maz@xxxxxxxxxx> · Wed, 23 Jun 2021 09:56:04 +0100

On Tue, 22 Jun 2021 10:54:12 +0100,
Mark Rutland <mark.rutland@xxxxxxx> wrote:
> 
> Hi Michael,
> 
> Thanks for all this; comments inline below. I've added Marc Zyngier, who
> co-maintains the architected timer code.
> 
> On Mon, Jun 14, 2021 at 02:42:23AM +0000, Michael Kelley wrote:
> > From: Mark Rutland <mark.rutland@xxxxxxx> Sent: Thursday, June 10, 2021 9:45 AM
> > > On Tue, Jun 08, 2021 at 03:36:06PM +0000, Michael Kelley wrote:
> > > > I've had a couple rounds of discussions with the Hyper-V team.   For
> > > > the clocksource we've agreed to table the live migration discussion, and
> > > > I'll resubmit the code so that arm_arch_timer.c provides the
> > > > standard arch_sys_counter clocksource.  As noted previously, this just
> > > > works for a Hyper-V guest.  The live migration discussion may come
> > > > back later after a deeper investigation by Hyper-V.
> > > 
> > > Great; thanks for this!
> > > 
> > > > For clockevents, there's not a near term fix.  It's more than just plumbing
> > > > an interrupt for Hyper-V to virtualize the ARM64 arch timer in a guest VM.
> > > > From their perspective there's also benefit in having a timer abstraction
> > > > that's independent of the architecture, and in the Linux guest, the STIMER
> > > > code is common across x86/x64 and ARM64.  It follows the standard Linux
> > > > clockevents model, as it should. The code is already in use in out-of-tree
> > > > builds in the Linux VMs included in Windows 10 on ARM64 as part of the
> > > > so-called "Windows Subsystem for Linux".
> > > >
> > > > So I'm hoping we can get this core support for ARM64 guests on Hyper-V
> > > > into upstream using the existing STIMER support.  At some point, Hyper-V
> > > > will do the virtualization of the ARM64 arch timer, but we don't want to
> > > > have to stay out-of-tree until after that happens.
> > > 
> > > My main concern here is making sure that we can rely on architected
> > > properties, and don't have to special-case architected bits for hyperv
> > > (or any other hypervisor), since that inevitably causes longer-term
> > > pain.
> > > 
> > > While in abstract I'm not as worried about using the timer
> > > clock_event_device specifically, that same driver provides the
> > > clocksource and the event stream, and I want those to work as usual,
> > > without being tied into the hyperv code. IIUC that will require some
> > > work, since the driver won't register if the GTDT is missing timer
> > > interrupts (or if there is no GTDT).
> > > 
> > > I think it really depends on what that looks like.
> > 
> > Mark,
> > 
> > Here are the details:
> > 
> > The existing initialization and registration code in arm_arch_timer.c
> > works in a Hyper-V guest with no changes.  As previously mentioned,
> > the GTDT exists and is correctly populated.  Even though it isn't used,
> > there's a PPI INTID specified for the virtual timer, just so
> > the "arm_sys_timer" clockevent can be initialized and registered.
> > The IRQ shows up in the output of "cat /proc/interrupts" with zero counts
> > for all CPUs since no interrupts are ever generated. The EL1 virtual
> > timer registers (CNTV_CVAL_EL0, CNTV_TVAL_EL0, and CNTV_CTL_EL0)
> > are accessible in the VM.  The "arm_sys_timer" clockevent is left in
> > a shutdown state with CNTV_CTL_EL0.ENABLE set to zero when the
> > Hyper-V STIMER clockevent is registered with a higher rating.
> 
> This concerns me, since we're lying to the kernel, and assuming that it
> will never try to use this timer. I appreciate that evidently we don't
> happen to rely on that today if you register a higher priority timer,
> but that does open us up to future fragility (e.g. if we added sanity
> checks when registering timers), and IIRC there are ways for userspace
> to change the clockevent device today.

Indeed. Userspace can perfectly unbind the clockevent using
/sys/devices/system/clockevents/clockevent*/unbind_device, and the
kernel will be happy to switch to the next per-cpu timer, which
happens to be the arch timer. Oh wait...

>
> > Event streams are initialized and the __delay() implementation
> > for ARM64 inside the kernel works.  However, on the Ampere
> > eMAG hardware I'm using for testing, the WFE instruction returns
> > more quickly than it should even though the event stream fields in
> > CNTKCTL_EL1 are correct.  I have a query in to the Hyper-V team 
> > to see if they are trapping WFE and just returning, vs. perhaps the
> > eMAG processor takes the easy way out and has WFE just return
> > immediately.  I'm not knowledgeable about other uses of timer
> > event streams, so let me know if there are other usage scenarios
> > I should check.
> 
> I saw your reply confirming that this is gnerally working as expected
> (and that Hyper-V is not trapping WFE) so this sounds fine to me.
> 
> > Finally, the "arch_sys_counter" clocksource gets initialized and
> > setup correctly.  If the Hyper-V clocksource is also initialized,
> > you can flip between the two clocksources at runtime as expected.
> > If the Hyper-V clocksource is not setup, then Linux in the VM runs
> > fine with the "arch_sys_counter" clocksource.
> 
> Great!
> 
> As above, my remaining concern here is fragility around the
> clockevent_device; I'm not keen that we're lying (in the GTDT) that
> interrupts are wired up when they not functional, and while you can get
> away with that today, that relies on kernel implementation details that
> could change.
> 
> Ideally, Hyper-V would provide the architectural timer (as it's already
> claiming to in the GTDT), things would "just work", and the Hyper-V
> timer would be an optimization rather than a functional necessity.
> 
> You mentioned above that Hyper-V will virtualize the timer "at some
> point" -- is that already planned, and when is that likely to be?
> 
> Marc, do you have any thoughts on this?

Overall, lying to the kernel is a bad idea. Only implementing half of
the architecture is another bad idea. I doubt the combination of two
bad ideas produces a good one.

If Hyper-V guests need to use another timer (for migration purposes?),
that's fine. But we rely on both the base architecture to be
completely implemented *and* on advertised features to be functional.
I think this has been our position since the first Hyper-V patches
were posted... 3 years ago?

What is the hold up for reliably virtualising the arch timer,
including interrupt delivery?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.