Re: [PATCH 0/2] hyperv: Move some features to common code

Nuno Das Neves <nunodasneves@xxxxxxxxxxxxxxxxxxx> · Wed, 22 Jan 2025 15:05:06 -0800

On 12/17/2024 9:48 AM, Michael Kelley wrote:
> From: Nuno Das Neves <nunodasneves@xxxxxxxxxxxxxxxxxxx> Sent: Monday, December 9, 2024 12:20 PM
>>
>> On 12/7/2024 6:59 PM, Michael Kelley wrote:
>>> From: Nuno Das Neves <nunodasneves@xxxxxxxxxxxxxxxxxxx> Sent: Friday, December 6, 2024 2:22 PM
>>>>
>>>> There are several bits of Hyper-V-related code that today live in
>>>> arch/x86 but are not really specific to x86_64 and will work on arm64
>>>> too.
>>>>
>>>> Some of these will be needed in the upcoming mshv driver code (for
>>>> Linux as root partition on Hyper-V).
>>>
>>> Previously, Linux as the root partition on Hyper-V was x86 only, which is
>>> why the code is currently under arch/x86. So evidently the mshv driver
>>> is being expanded to support both x86 and arm64, correct? Assuming
>>> that's the case, I have some thoughts about how the source code should
>>> be organized and built. It's probably best to get this right to start with so
>>> it doesn't need to be changed again.
>>
>> Yes, we plan on supporting both architectures (eventually). I completely agree
>> that it's better to sort out these issues now rather than later.
>>
>>>
>>> * Patch 2 of this series moves hv_call_deposit_pages() and
>>>    hv_call_create_vp() to common code, but does not move
>>>    hv_call_add_logical_proc(). All three are used together, so
>>>    I'm wondering why hv_call_add_logical_proc() isn't moved.
>>>
>>
>> The only reason is that in our internal tree there's no common or arm64 code
>> yet that uses it - there is no reason it can't also become common code!
> 
> Maybe I'm missing your point, but hv_call_add_logical_proc() and
> hv_call_create_vp() are called in succession in hv_smp_prepare_cpus(),
> so it seems like they very much go together. Presumably a similar
> sequence will be needed on the arm64 side when running as the
> root partition?
> 

Yes that's right, sorry I wasn't being too clear.

>>
>>> * These three functions were originally put in a separate source
>>>    code file because of being specific to running in the root partition,
>>>    and not needed for generic Linux guest support. I think there's
>>>    value in keeping them in a separate file, rather than merging them
>>>    into hv_common.c. Maybe just move the entire hv_proc.c file?
>>
>> Agreed. I think it should be renamed too - this file will eventually
>> contain some additional hypercall helper functions, some of which may also be
>> shared by the driver code. Something like "hv_call_common.c"?
> 
> I went back and looked at your patch series from a year ago [1], and
> got a better understanding of where this work is headed. I wanted
> to comment on that series back then, but got subsumed in wrapping
> things up for my retirement. :-(  I see significant portions of that
> series have been posted independently and accepted, so my further
> comments here assume the rest of the series is still the macro-level
> approach you are taking.
> 
>>From that series, you are planning three modules, controlled by
> CONFIG_MSHV, CONFIG_MSHV_ROOT, and CONFIG_MSHV_VTL.
> That makes sense, and addresses one of my top-level concerns,
> which is that normal guest kernels shouldn't include all that code.
> And apparently that code works as a module as well as built-in.
> 
> The code currently in hv_proc.c is similar to the code in hv_call.c
> and mshv_root_hv_call.c from that series -- it's a wrapper around
> Hyper-V hypercalls. But a difference is that the code in hv_proc.c
> can't be a module because it is called from hv_smp_prepare_cpus(),
> which must be built-in. So it can't be added to the proposed
> hv_call.c without making all of hv_call.c built-in. Ideally, there
> would be a separate source file just for the code that must be
> built-in. I'm not familiar enough with root partition requirements
> to understand what hv_smp_prepare_cpus() and its calls to
> hv_call_add_logical_proc() and hv_call_create_vp() are doing.
> Evidently it is related to bringing up CPUs in the root partition,
> and not related to guest VMs. But those two hv_call_* functions
> would also be used for managing guest VMs from /dev/mshv.
> 

It makes sense to keep the different kinds of functionality
separated based on where they are needed. I do wonder if it will be
cumbersome to keep all the cases in their own files - builtin/module,
root/guest, and separated into arch directories vs drivers/hv...

> As for the name, I don't really like "common", even though I'm the
> one who created "hv_common.c" back when doing the initial arm64
> support on Hyper-V. :-( My thinking is that anything that isn't under
> arch/x86 or arch/arm64 is by definition shared across architectures,
> so having "common" in the name is superfluous. Maybe just
> staying with hv_proc.c is OK.
> 

I'll stick with hv_proc.c.

>>
>>>    And then later, perhaps move the entire irqdomain.c file as well?
>> Yes, may as well move it too.
> 
> irqdomain.c is also in that category of needing to be built-in. But
> looking more closely, it is x86 specific and should stay where it is. I
> can't immediately tell whether it's feasible to make the Hyper-V
> IOMMU driver (and irqdomain.c) architecture neutral, or whether
> a separate arm64 implementation will be needed. My guess is the
> latter.
> 
>>
>>>    There's also an interesting question of whether to move them into
>>>    drivers/hv, or create a new directory virt/hyperv. Hyper-V support
>>>    started out 15 years ago structured as a driver, hence "drivers/hv".
>>>    But over the time, the support has become significantly more than
>>>    just a driver, so "virt/hyperv" might be a better location for
>>>    non-driver code that had previously been under arch/x86 but is
>>>    now common to all architectures.
>>>
>> I'd be fine with using "virt/hyperv", but I thought "virt" was only for
>> KVM.
> 
> Now that I see the bigger picture from your previous patch series,
> keeping the files in drivers/hv seems OK to me. The 'mshv' code *is*
> a driver that implements /dev/mshv. :-)
> 
>>
>> Another option would be to create subdirectories in "drivers/hv" to
>> organize the different modules more cleanly (i.e. when the /dev/mshv
>> driver code is introduced).
> 
> Putting all the mshv and "running as root partition" files in a single
> sub-directory might make sense. Multiple sub-directories might be
> overkill. But I don't have a strong opinion either way. Putting them
> all directly in drivers/hv seems OK, as does one or more sub-directories.
> 
> One thing I encountered back when first doing the arm64 support is
> that everything in the drivers/hv directory could be built as a module.
> The Hyper-V support under arch/x86 was always built-in, and the
> arch/arm64 code I added was as well. But there was no obvious place
> to put common code that must always be built-in. At the time, I thought
> about introducing virt/hyperv, but had only a single relatively small source
> code file, and introducing that new pathname with its own Makefile, etc.
> seemed like overkill. So drivers/hv/hv_common.c came into existence. I
> tweaked the existing drivers/hv/Makefile so hv_common.c is always
> built-in even when CONFIG_HYPERV=m. It's a little messy, but not too
> bad.
> 
> The difference in what can be built as a module vs. must be built-in
> might be a factor in the directory structure, along with the
> CONFIG options that control whether the root partition code
> gets built at all (see below). The details will govern what works
> well and what ends up being a bit of a mess.
> 
>>
>>> * Today, the code for running in the root partition is built along
>>>    with the rest of the Hyper-V support, and so is present in kernels
>>>    built for normal Linux guests on Hyper-V. I haven't thought about
>>>    all the implications, but perhaps there's value in having a CONFIG
>>>    option to build for the root partition, so that code can be dropped
>>>    from normal kernels. There's a significant amount of new code still
>>>    to come for mshv that could be excluded from normal guests in this
>>>    way. Also, the tests of the hv_root_partition variable could be
>>>    changed to a function the compiler detects is always "false" in a
>>>    kernel built without the CONFIG option, in which case it can drop
>>>    the code for where hv_root_partition is "true".
>>>
>> Using hv_root_partition is a good way to do it, since it won't require
>> many #ifdefs or moving the existing code around too much.
>>
>> I can certainly give it a try, and create a separate patch series
>> introducing the option. I suppose "CONFIG_HYPERV_ROOT" makes sense as a
>> name?
> 
> Now that I see you have CONFIG_MSHV, CONFIG_MSHV_ROOT, and
> CONFIG_MSHV_VTL planned, could building the hv_root_partition code
> just be under control of CONFIG_MSHV_ROOT without introducing
> another CONFIG variable? Is any of the root partition code like
> hv_proc.c and irqdomain.c needed if CONFIG_MSHV_ROOT=n?

I will experiment with it - I think we can probably do it that way, yes.

> Stubs will be needed for functions called from the generic kernel code
> when CONFIG_MSHV_ROOT=n, but those are easily supplied in
> asm/mshyperv.h. If hv_root_partition becomes a function whose
> return value is gated by CONFIG_MSHV_ROOT, then the compiler
> can know when the value is always "false" and drop even the code
> that calls the stubs. But I'm pretty sure the stubs are still needed to
> avoid compile errors, even when the compiler drops the code.
> 
> With this approach, you can avoid #ifdefs except in asm/mshyperv.h
> for the stubs, and in the hv_root_partition() function.
> 
> One more code structure topic:  In the previous patch series,
> some of the mshv code is x86 specific. There is x86 assembler
> code, and references to x86 specific registers (all the HV_X64_*
> registers). Do you plan to put architecture specific code under
> drivers/hv, or under arch/[x86/arm64]/hyperv? We've made
> drivers/hv be architecture neutral -- currently there's only one
> "cheat" with an #ifdef CONFIG_ARM64 in the hv_balloon.c driver
> that turns off some functionality.

Today in our internal tree, all the module code lives in drivers/hv.
Some of the code is arch-neutral "in-theory", but uses hypervisor
features that are not yet supported on arm64, (but may be in future.)

For such features, we use #ifdefs in drivers/hv to conditionally compile
it. Note that today we only compile test arm64, we don't actually have
a running build for it yet, so the driver will be x86_64 only at first.

There is very little driver code that actually deals with architectural
details. If necessary, I imagine we would create arch-specific files
inside drivers/hv at first, and maybe move them to arch/ if appropriate.

> 
> As we discussed previously, the new Hyper-V #include files
> provide the union of x86 and arm64 definitions, with just an
> occasional #ifdef where needed. That was justified because
> that's how the definitions come from the Windows/Hyper-V
> world. But it seems like the mshv code should be structured
> with arch specific code is under arch, not in drivers/hv with
> #ifdefs. The Makefiles in arch/[x86,arm64]/hyperv will need
> to use CONFIG_MSHV, CONFIG_MSHV_ROOT, and
> CONFIG_MSHV_VTL to decide what to build, and whether to
> build as a module vs. built-in.
> 
> Maybe putting the mshv code in a "mshv" subdirectory under
> both drivers/hv and arch/[x86,arm64]/hyperv would conceptually
> help tie together the arch neutral and arch specific portions.
> Or something like that ....
> 

I think it could evolve to that point, but it's hard to say until we
get closer.

>>
>>> * The code currently in hv_proc.c is built for x86 only, and validly
>>>    assumes the page size is 4K. But when the code moves to be
>>>    common across architectures, that assumption is no longer
>>>    valid in the general case. Perhaps the intent is that kernels for
>>>    the root partition should always be built with page size 4K on
>>>    arm64, but nothing enforces that intent. Personally, I think the code
>>>    should be made to work with page sizes other than 4K so as to not
>>>    leave technical debt. But I realize you may have other priorities. If
>>>    there were a CONFIG option for building for the root partition,
>>>    that option could be setup to enforce the 4K page size on arm64.
>>>
>> That makes sense. I suppose this can be done by selecting PAGE_SIZE_4KB
>> under HYPERV in drivers/hv/Kconfig?
> 
> Since the PAGE_SIZE value is independently selectable in the .config
> file, I'm not sure if you can override it when CONFIG_MSHV_ROOT is
> set. But you can allow CONFIG_MSHV_ROOT to be set only if
> PAGE_SIZE_4KB is selected on arm64. I'd have to look in more detail
> to figure out the best way to create an appropriate dependency.
> 
>>
>> I'm not how easy it will be to make the code work with different page
>> sizes, since we use alloc_page() and similar in a few places, assuming 4k.
> 
> alloc_page() is not the problem as it is relatively easy to break up a
> 16K or 64K page into multiple 4K pages when depositing memory.
> And you can round up the amount of deposited memory to the
> larger boundary without doing any harm. But in your previous patch
> series, I see hv_call_withdraw_memory(), wherein Hyper-V gives
> back individual 4K pages in no particular order. That's the killer, as
> it is not feasible to re-assemble a random set of 4K pages into larger
> 16K or 64K pages for free_page().
> 
> So scratch that idea. :-( The root partition must run with a page
> size that matches the size Hyper-V uses to do memory deposits to
> and from a partition, and that's 4K.

Yes that does seem to clinch it. There will have to be a dependency.

Thanks for the detailed responses, sorry it took me some time to get to.

Nuno

> 
> Michael
> 
> [1] https://lore.kernel.org/lkml/1696010501-24584-1-git-send-email-nunodasneves@xxxxxxxxxxxxxxxxxxx/