RE: [PATCH 2/2] hyperv: Do not overlap the input and output hypercall areas in get_vtl(void)

Michael Kelley <mhklinux@xxxxxxxxxxx> · Thu, 19 Dec 2024 21:37:48 +0000

From: Roman Kisel <romank@xxxxxxxxxxxxxxxxxxx> Sent: Thursday, December 19, 2024 10:19 AM
> 
> On 12/18/2024 6:42 PM, Wei Liu wrote:
> > On Wed, Dec 18, 2024 at 12:54:21PM -0800, Roman Kisel wrote:

[...]

> >> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> >> index c7185c6a290b..90c9ea00273e 100644
> >> --- a/arch/x86/hyperv/hv_init.c
> >> +++ b/arch/x86/hyperv/hv_init.c
> >> @@ -422,7 +422,7 @@ static u8 __init get_vtl(void)
> >>
> >>   	local_irq_save(flags);
> >>   	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> >> -	output = (struct hv_get_vp_registers_output *)input;
> >> +	output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> >
> > You can do
> >
> > 	output = (char *)input + HV_HYP_PAGE_SIZE / 2;
> >
> > to avoid the extra allocation.
> >
> > The input and output structures surely won't take up half of the page.
> Agreed on the both counts! I do think that the attempt to save here
> won't help much: the hypercall output per-CPU pages in the VTL mode are
> needed just as in the dom0/root partition mode because this hypercall
> isn't going to be the only one required.
> 
> In other words, we will have to allocate these pages anyway as we evolve
> the code; we are trying to save here what is going to be spent anyway.
> Sort of, kicking the can down the road as the saying goes :)
> 
> I do understand that within the code that is already merged, there is
> just one this place in this function where the hypercall that returns
> data is used. And the proposed approach makes the code self-explanatory:
> ```
> output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> ```
> 
> as opposed to
> 
> ```
> output = (char *)input + HV_HYP_PAGE_SIZE / 2;
> ```
> 
> or, as it existed,
> 
> ```
> output = (struct hv_get_vp_registers_output *)input;
> ```
> 
> which both do require a good comment I believe.
> 
> There will surely be more hypercall usage in the VTL mode that return
> data and require the output pages as we progress with upstreaming the
> VTL patches. Enabling the hypercall output pages allows to fix the
> function in question in a very natural way, making it possible to
> replace with some future `hv_get_vp_register` that would work for both
> dom0 and VTL mode just the same.
> 
> All told, if you believe that we should make this patch a one-liner,
> I'll do as you suggested.
> 

FWIW, Roman and I had this same discussion back in August. See [1].
I'll add one new thought that wasn't part of the August discussion.

To my knowledge, the hypercalls that *may* use a full page for input
and/or a full page for output don't actually *require* a full page. The
size of the input and output depends on how many "entries" the
hypercall is specified to process, where "entries" could be registers,
memory PFNs, or whatever. I would expect the code to invoke these
hypercalls must already deal with the case where the requested number
of entries causes the input or output size to exceed one page, so the
code just iterates making multiple invocations of the hypercall with
a "batch size" that fits in one page.

It would be perfectly reasonable to limit the batch size so that a
"batch" of input or output fits in a half page instead of a full page,
avoiding the need to allocate hyperv_pcpu_output_arg. Or if the
input and output sizes are not equal, use whatever input vs. output
partitioning of a single page make sense for that hypercall. The
tradeoff, of course, is having to make the hypercall more times
with smaller batches. But if the hypercall is not in a hot path, this
might be a reasonable tradeoff to avoid the additional memory
allocation. Or if the hypercall processing time per "entry" is high,
the added overhead of more invocations with smaller batches is
probably negligible compared to the time processing the entries.

This scheme could also be used in the existing root partition code
that is currently the only user of the hyperv_pcpu_output_arg.
I could see a valid argument being made to drop
hyperv_pcpu_output_arg entirely and just use smaller batches.

Or are there hypercalls where a smaller batch size doesn't work
at all or is a bad tradeoff for performance reasons? I know I'm not
familiar with *all* the hypercalls that might be used in root
partition or VTL code. If there are such hypercalls, I would be curious
to know about them.

Michael

[1] https://lore.kernel.org/linux-hyperv/SN6PR02MB415759676AEF931F030430FDD4BE2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/