From: Marc Zyngier <maz@xxxxxxxxxx> Sent: Wednesday, June 23, 2021 1:56 AM > > On Tue, 22 Jun 2021 10:54:12 +0100, > Mark Rutland <mark.rutland@xxxxxxx> wrote: > > > > Hi Michael, > > > > Thanks for all this; comments inline below. I've added Marc Zyngier, who > > co-maintains the architected timer code. > > > > On Mon, Jun 14, 2021 at 02:42:23AM +0000, Michael Kelley wrote: > > > From: Mark Rutland <mark.rutland@xxxxxxx> Sent: Thursday, June 10, 2021 9:45 AM > > > > On Tue, Jun 08, 2021 at 03:36:06PM +0000, Michael Kelley wrote: > > > > > I've had a couple rounds of discussions with the Hyper-V team. For > > > > > the clocksource we've agreed to table the live migration discussion, and > > > > > I'll resubmit the code so that arm_arch_timer.c provides the > > > > > standard arch_sys_counter clocksource. As noted previously, this just > > > > > works for a Hyper-V guest. The live migration discussion may come > > > > > back later after a deeper investigation by Hyper-V. > > > > > > > > Great; thanks for this! > > > > > > > > > For clockevents, there's not a near term fix. It's more than just plumbing > > > > > an interrupt for Hyper-V to virtualize the ARM64 arch timer in a guest VM. > > > > > From their perspective there's also benefit in having a timer abstraction > > > > > that's independent of the architecture, and in the Linux guest, the STIMER > > > > > code is common across x86/x64 and ARM64. It follows the standard Linux > > > > > clockevents model, as it should. The code is already in use in out-of-tree > > > > > builds in the Linux VMs included in Windows 10 on ARM64 as part of the > > > > > so-called "Windows Subsystem for Linux". > > > > > > > > > > So I'm hoping we can get this core support for ARM64 guests on Hyper-V > > > > > into upstream using the existing STIMER support. At some point, Hyper-V > > > > > will do the virtualization of the ARM64 arch timer, but we don't want to > > > > > have to stay out-of-tree until after that happens. > > > > > > > > My main concern here is making sure that we can rely on architected > > > > properties, and don't have to special-case architected bits for hyperv > > > > (or any other hypervisor), since that inevitably causes longer-term > > > > pain. > > > > > > > > While in abstract I'm not as worried about using the timer > > > > clock_event_device specifically, that same driver provides the > > > > clocksource and the event stream, and I want those to work as usual, > > > > without being tied into the hyperv code. IIUC that will require some > > > > work, since the driver won't register if the GTDT is missing timer > > > > interrupts (or if there is no GTDT). > > > > > > > > I think it really depends on what that looks like. > > > > > > Mark, > > > > > > Here are the details: > > > > > > The existing initialization and registration code in arm_arch_timer.c > > > works in a Hyper-V guest with no changes. As previously mentioned, > > > the GTDT exists and is correctly populated. Even though it isn't used, > > > there's a PPI INTID specified for the virtual timer, just so > > > the "arm_sys_timer" clockevent can be initialized and registered. > > > The IRQ shows up in the output of "cat /proc/interrupts" with zero counts > > > for all CPUs since no interrupts are ever generated. The EL1 virtual > > > timer registers (CNTV_CVAL_EL0, CNTV_TVAL_EL0, and CNTV_CTL_EL0) > > > are accessible in the VM. The "arm_sys_timer" clockevent is left in > > > a shutdown state with CNTV_CTL_EL0.ENABLE set to zero when the > > > Hyper-V STIMER clockevent is registered with a higher rating. > > > > This concerns me, since we're lying to the kernel, and assuming that it > > will never try to use this timer. I appreciate that evidently we don't > > happen to rely on that today if you register a higher priority timer, > > but that does open us up to future fragility (e.g. if we added sanity > > checks when registering timers), and IIRC there are ways for userspace > > to change the clockevent device today. > > Indeed. Userspace can perfectly unbind the clockevent using > /sys/devices/system/clockevents/clockevent*/unbind_device, and the > kernel will be happy to switch to the next per-cpu timer, which > happens to be the arch timer. Oh wait... > > > > > > Event streams are initialized and the __delay() implementation > > > for ARM64 inside the kernel works. However, on the Ampere > > > eMAG hardware I'm using for testing, the WFE instruction returns > > > more quickly than it should even though the event stream fields in > > > CNTKCTL_EL1 are correct. I have a query in to the Hyper-V team > > > to see if they are trapping WFE and just returning, vs. perhaps the > > > eMAG processor takes the easy way out and has WFE just return > > > immediately. I'm not knowledgeable about other uses of timer > > > event streams, so let me know if there are other usage scenarios > > > I should check. > > > > I saw your reply confirming that this is gnerally working as expected > > (and that Hyper-V is not trapping WFE) so this sounds fine to me. > > > > > Finally, the "arch_sys_counter" clocksource gets initialized and > > > setup correctly. If the Hyper-V clocksource is also initialized, > > > you can flip between the two clocksources at runtime as expected. > > > If the Hyper-V clocksource is not setup, then Linux in the VM runs > > > fine with the "arch_sys_counter" clocksource. > > > > Great! > > > > As above, my remaining concern here is fragility around the > > clockevent_device; I'm not keen that we're lying (in the GTDT) that > > interrupts are wired up when they not functional, and while you can get > > away with that today, that relies on kernel implementation details that > > could change. > > > > Ideally, Hyper-V would provide the architectural timer (as it's already > > claiming to in the GTDT), things would "just work", and the Hyper-V > > timer would be an optimization rather than a functional necessity. > > > > You mentioned above that Hyper-V will virtualize the timer "at some > > point" -- is that already planned, and when is that likely to be? > > > > Marc, do you have any thoughts on this? > > Overall, lying to the kernel is a bad idea. Only implementing half of > the architecture is another bad idea. I doubt the combination of two > bad ideas produces a good one. > > If Hyper-V guests need to use another timer (for migration purposes?), > that's fine. But we rely on both the base architecture to be > completely implemented *and* on advertised features to be functional. > I think this has been our position since the first Hyper-V patches > were posted... 3 years ago? > > What is the hold up for reliably virtualising the arch timer, > including interrupt delivery? Marc (and Mark) -- In our early interactions about the Hyper-V clocks and timers, the code was a bit spread out, and you suggested moving all the clocksource and clockevent stuff to a driver under drivers/clocksource. See https://lore.kernel.org/lkml/e0374a07-809c-cabd-2eb6-e6b5ad84742e@xxxxxxx/. That was a good change independent of any ARM64 considerations, but I read (or perhaps overread) your comments to say that it was OK to use these Hyper-V para-virtualized clocks/timers instead of the ARM64 architectural ones in a Hyper-V VM. They work and it's what the Hyper-V guys wanted to do anyway, so having Hyper-V offer the ARM64 arch counter and timer in a VM hasn't been a priority. They had other stuff that didn't work at all on ARM64, so that's where their attention went. I agree that it would be better to have the ARM64 arch counter/timer fully implemented in a Hyper-V VM. But we're wanting to find a practical way to move forward, and in doing so confine any rough edges to Hyper-V VMs and the Hyper-V specific code in the kernel tree. We're maintaining and shipping the code out-of-tree based on Hyper-V ARM64 current behavior and would like to get this core enablement code upstream. Sure, unbinding the Hyper-V clockevent doesn't work, but that's not a problem in any use cases we see from customers. All that said, our discussions with the Hyper-V team are continuing. We're still in the process of seeing what's practical to get and when. Michael > > Thanks, > > M. > > -- > Without deviation from the norm, progress is not possible.