Re: [PATCH 2/4] arm64: hyperv: Add support for Hyper-V as a hypervisor

Marc Zyngier <marc.zyngier@xxxxxxx> · Thu, 13 Dec 2018 11:23:23 +0000

Hi Michael,

On 12/12/2018 05:00, Michael Kelley wrote:
> From: Marc Zyngier <marc.zyngier@xxxxxxx>  Sent: Friday, December 7, 2018 6:43 AM
> 
>>> Add ARM64-specific code to enable Hyper-V. This code includes:
>>> * Detecting Hyper-V and initializing the guest/Hyper-V interface
>>> * Setting up Hyper-V's synthetic clocks
>>> * Making hypercalls using the HVC instruction
>>> * Setting up VMbus and stimer0 interrupts
>>> * Setting up kexec and crash handlers
>>
>> This commit message is a clear indication that this should be split in
>> at least 5 different patches.
> 
> OK, I'll work on separating into multiple layered patches in the next
> version.

Thanks. This will definitely help the review process.

>>> +/*
>>> + * This variant of HVC invocation is for hv_get_vpreg and
>>> + * hv_get_vpreg_128. The input parameters are passed in registers
>>> + * along with a pointer in x4 to where the output result should
>>> + * be stored. The output is returned in x15 and x16.  x18 is used as
>>> + * scratch space to avoid buildng a stack frame, as Hyper-V does
>>> + * not preserve registers x0-x17.
>>> + */
>>> +ENTRY(hv_do_hvc_fast_get)
>>> +	mov x18, x4
>>> +	hvc #1
>>> +	str x15,[x18]
>>> +	str x16,[x18,#8]
>>> +	ret
>>> +ENDPROC(hv_do_hvc_fast_get)
>>
>> As Will said, this isn't a viable option. Please follow SMCCC 1.1.
> 
> I'll have to start a conversation with the Hyper-V team about this.
> I don't know why they chose to use HVC #1 or this register scheme
> for output values.  It may be tough to change at this point because
> there are Windows guests on Hyper-V for ARM64 that are already
> using this approach.

I appreciate you already have stuff in the wild, but there is definitely
a case to be made for supporting architecturally specified mechanisms in
a hypervisor, and SMCCC is definitely part of it (I'm certainly curious
of how you support the Spectre mitigation otherwise).

> 
>>
>>> diff --git a/arch/arm64/hyperv/hv_init.c b/arch/arm64/hyperv/hv_init.c
>>> new file mode 100644
>>> index 000000000000..aa1a8c09d989
>>> --- /dev/null
>>> +++ b/arch/arm64/hyperv/hv_init.c
>>> @@ -0,0 +1,441 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +/*
>>> + * Initialization of the interface with Microsoft's Hyper-V hypervisor,
>>> + * and various low level utility routines for interacting with Hyper-V.
>>> + *
>>> + * Copyright (C) 2018, Microsoft, Inc.
>>> + *
>>> + * Author : Michael Kelley <mikelley@xxxxxxxxxxxxx>
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify it
>>> + * under the terms of the GNU General Public License version 2 as published
>>> + * by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope that it will be useful, but
>>> + * WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
>>> + * NON INFRINGEMENT.  See the GNU General Public License for more
>>> + * details.
>>> + */
>>> +
>>> +
>>> +#include <linux/types.h>
>>> +#include <linux/version.h>
>>> +#include <linux/export.h>
>>> +#include <linux/vmalloc.h>
>>> +#include <linux/mm.h>
>>> +#include <linux/clocksource.h>
>>> +#include <linux/sched_clock.h>
>>> +#include <linux/acpi.h>
>>> +#include <linux/module.h>
>>> +#include <linux/hyperv.h>
>>> +#include <linux/slab.h>
>>> +#include <linux/cpuhotplug.h>
>>> +#include <linux/psci.h>
>>> +#include <asm-generic/bug.h>
>>> +#include <asm/hypervisor.h>
>>> +#include <asm/hyperv-tlfs.h>
>>> +#include <asm/mshyperv.h>
>>> +#include <asm/sysreg.h>
>>> +#include <clocksource/arm_arch_timer.h>
>>> +
>>> +static bool	hyperv_initialized;
>>> +struct		ms_hyperv_info ms_hyperv;
>>> +EXPORT_SYMBOL_GPL(ms_hyperv);
>>
>> Who are the users of this structure? Should they go via accessors instead?
> 
> The structure is an aggregation of several flags fields that describe a myriad
> of features and hints that may or may not be present on any particular version
> of Hyper-V, plus the max virtual processor ID values.  Everything is read-only
> after initialization.   

nit: please consider using a __ro_after_init annotation in that case.

> Most of the references are to test one of the flags.  It's
> a judgment call, but there are a lot of different flags with long names, and
> writing accessors for each one doesn't seem to me to add any clarity.

Looking at that structure, it doesn't seem so bad, and you could easily
have generic accessors that take a flag as a parameter. Your call.

[...]

>>> +static struct clocksource hyperv_cs_msr = {
>>> +	.name		= "hyperv_clocksource_msr",
>>> +	.rating		= 400,
>>> +	.read		= read_hv_clock_msr,
>>> +	.mask		= CLOCKSOURCE_MASK(64),
>>> +	.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
>>> +};
>>> +
>>> +struct clocksource *hyperv_cs;
>>> +EXPORT_SYMBOL_GPL(hyperv_cs);
>>
>> Why? Who needs to poke this?
> 
> It's referenced in the architecture independent driver code
> for the Hyper-V clocksource in drivers/hv/hv.c, and in the
> code to sync the time with the Hyper-V host in
> drivers/hv/hv_util.c.

Fair enough.

>>> +static int hv_cpu_init(unsigned int cpu)
>>> +{
>>> +	u64 msr_vp_index;
>>> +
>>> +	hv_get_vp_index(msr_vp_index);
>>> +
>>> +	hv_vp_index[smp_processor_id()] = msr_vp_index;
>>> +
>>> +	if (msr_vp_index > hv_max_vp_index)
>>> +		hv_max_vp_index = msr_vp_index;
>>> +
>>> +	return 0;
>>> +}
>>
>> Is that some new way to describe a CPU topology? If so, why isn't that
>> exposed via the ACPI tables that the kernel already parses?
> 
> Hyper-V's hypercall interface uses vCPU identifiers that are not
> guaranteed to be consecutive integers or to match what ACPI shows.
> No topology information is implied -- it's just unique identifiers.  The 
> hv_vp_index array provides easy mapping from Linux's consecutive
> integer IDs for CPUs when needed to construct hypercall arguments.

That's extremely odd. The hypervisor obviously knows which vCPU is doing
a hypercall, and if referencing another vCPU, the virtualized MPIDR_EL1
value should be used. I don't think deviating from the architecture is a
good idea (but I appreciate this is none of your doing). Following the
architecture would allow this code to directly use the cpu_logical_map
infrastructure we alreadu have.

> 
>>
>>> +
>>> +/*
>>> + * This function is invoked via the ACPI clocksource probe mechanism. We
>>> + * don't actually use any values from the ACPI GTDT table, but we set up
>>
>> This doesn't feel like a good idea at all. Piggy-backing on an existing
>> mechanism and use it for something completely different is not exactly
>> future-proof.
>>
>> Also, if this is supposed to be a clocksource, why isn't that a
>> clocksource driver on its own right?
> 
> I agree this is not the right long term solution.  Is there a better place to
> hang the initialization code?  Or should I just make an explicit call to
> initialize Hyper-V at the right place?  On the x86 side, there's an
> explicit framework for hypervisor-specific initialization routines to plug
> into.  Maybe it's time for a basic version of such a framework on the
> ARM64 side.  Thoughts on the best approach, both in the short-term and
> the longer-term? If we put a framework in place, does that need to
> happen before adding Hyper-V code, or afterwards as a cleanup?

If we're introducing an infrastructure, doing it before introducing more
stuff seems to be the logical option. But I'm not sure we really need
such an infrastructure in this case, unless we need to enforce some
specific ordering.

You could start by moving all the clocksource stuff to
drivers/clocksource and keep the same init mechanism for the time being.

> 
> And yes, Hyper-V does effectively have its own clocksource.  The
> main code is in drivers/hv/hv.c, but it's not broken out as a separate
> driver in drivers/clocksource, probably due to some history on the
> x86 side that pre-dates me.  I'll have to research.

OK.

> 
>>
>>> + * the Hyper-V synthetic clocksource and do other initialization for
>>> + * interacting with Hyper-V the first time.  Using early_initcall to invoke
>>> + * this function is too late because interrupts are already enabled at that
>>> + * point, and sched_clock_register must run before interrupts are enabled.
>>> + *
>>> + * 1. Setup the guest ID.
>>> + * 2. Get features and hints info from Hyper-V
>>> + * 3. Setup per-cpu VP indices.
>>> + * 4. Register Hyper-V specific clocksource.
>>> + * 5. Register the scheduler clock.
>>> + */
>>> +
>>> +static int __init hyperv_init(struct acpi_table_header *table)
>>> +{
>>> +	struct hv_get_vp_register_output result;
>>> +	u32	a, b, c, d;
>>> +	u64	guest_id;
>>> +	int	i;
>>> +
>>> +	/*
>>> +	 * If we're in a VM on Hyper-V, the ACPI hypervisor_id field will
>>> +	 * have the string "MsHyperV".
>>> +	 */
>>> +	if (strncmp((char *)&acpi_gbl_FADT.hypervisor_id, "MsHyperV", 8))
>>> +		return 1;
>>> +
>>> +	/* Setup the guest ID */
>>> +	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
>>> +	hv_set_vpreg(HV_REGISTER_GUEST_OSID, guest_id);
>>> +
>>> +	/* Get the features and hints from Hyper-V */
>>> +	hv_get_vpreg_128(HV_REGISTER_PRIVILEGES_AND_FEATURES, &result);
>>> +	ms_hyperv.features = lower_32_bits(result.registervaluelow);
>>> +	ms_hyperv.misc_features = upper_32_bits(result.registervaluehigh);
>>> +
>>> +	hv_get_vpreg_128(HV_REGISTER_FEATURES, &result);
>>> +	ms_hyperv.hints = lower_32_bits(result.registervaluelow);
>>> +
>>> +	pr_info("Hyper-V: Features 0x%x, hints 0x%x\n",
>>> +		ms_hyperv.features, ms_hyperv.hints);
>>> +
>>> +	/*
>>> +	 * Direct mode is the only option for STIMERs provided Hyper-V
>>> +	 * on ARM64, so Hyper-V doesn't actually set the flag.  But add the
>>> +	 * flag so the architecture independent code in drivers/hv/hv.c
>>> +	 * will correctly use that mode.
>>> +	 */
>>> +	ms_hyperv.misc_features |= HV_STIMER_DIRECT_MODE_AVAILABLE;
>>> +
>>> +	/*
>>> +	 * Hyper-V on ARM64 doesn't support AutoEOI.  Add the hint
>>> +	 * that tells architecture independent code not to use this
>>> +	 * feature.
>>> +	 */
>>> +	ms_hyperv.hints |= HV_DEPRECATING_AEOI_RECOMMENDED;
>>> +
>>> +	/* Get information about the Hyper-V host version */
>>> +	hv_get_vpreg_128(HV_REGISTER_HYPERVISOR_VERSION, &result);
>>> +	a = lower_32_bits(result.registervaluelow);
>>> +	b = upper_32_bits(result.registervaluelow);
>>> +	c = lower_32_bits(result.registervaluehigh);
>>> +	d = upper_32_bits(result.registervaluehigh);
>>> +	pr_info("Hyper-V: Host Build %d.%d.%d.%d-%d-%d\n",
>>> +		b >> 16, b & 0xFFFF, a, d & 0xFFFFFF, c, d >> 24);
>>> +
>>> +	/* Allocate percpu VP index */
>>> +	hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
>>> +				    GFP_KERNEL);
>>
>> Why isn't this a percpu variable?
> 
> In current code in the architecture independent Hyper-V drivers (as well
> as some future Hyper-V enlightenments that aren't yet implemented
> for ARM64), the running CPU needs to get the VP index values for any CPUs
> in the hv_vp_index array.  Some of the code is performance sensitive, and
> accessing a global array is faster than accessing other CPUs' per-cpu data.

Fair enough. But his seems to tie into the above discussion about the
use of MPIDR_EL1 and the logical CPU mapping.

[...]

>>> +free_vp_index:
>>> +	kfree(hv_vp_index);
>>> +	hv_vp_index = NULL;
>>> +	return 1;
>>
>> ????
> 
> The return value is there because this function is implemented as a timer
> initialization function, and that's what the function signature requires.
> But maybe I'm not understanding your question.

I'm slightly miffed by the "return 1". I'd expect explicit negative
return codes that describe the error.

[...]

>>> +	vmbus_irq = acpi_register_gsi(NULL, HYPERVISOR_CALLBACK_VECTOR,
>>> +				 ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
>>> +	if (vmbus_irq <= 0) {
>>> +		pr_err("Can't register Hyper-V VMBus GSI. Error %d",
>>> +			vmbus_irq);
>>> +		vmbus_irq = 0;
>>> +		return;
>>> +	}
>>> +	vmbus_evt = alloc_percpu(long);
>>> +	result = request_percpu_irq(vmbus_irq, hyperv_vector_handler,
>>> +			"Hyper-V VMbus", vmbus_evt);
>>
>> If this is a per-cpu interrupt, why isn't it signalled as a PPI, in an
>> architecture compliant way?
> 
> Except for the code in this module, the interrupt handler for VMbus
> interrupts is architecture independent.  But there's no support for
> per-process interrupts on the x86, so the hypervisor interrupt vectors
> are hard coded in the same IDT entry across all processors, and the
> normal IRQ allocation mechanism is bypassed.  The above approach
> assigns an ARM64 PPI (HYPERVISOR_CALLBACK_VECTOR is 16) in a
> way that works with the arch independent interrupt handler.
> 
> Or maybe I'm missing your point.  If so, please set me straight.

Sorry, I missed that HYPERVISOR_CALLBACK_VECTOR was a PPI. It looks OK now.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
devel mailing list
devel@xxxxxxxxxxxxxxxxxxxxxx
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel