On Mon, May 18, 2020 at 1:50 AM Anastassios Nanos <ananos@xxxxxxxxxxxxxxx> wrote: > > On Mon, May 18, 2020 at 10:50 AM Marc Zyngier <maz@xxxxxxxxxx> wrote: > > > > On 2020-05-18 07:58, Anastassios Nanos wrote: > > > To spawn KVM-enabled Virtual Machines on Linux systems, one has to use > > > QEMU, or some other kind of VM monitor in user-space to host the vCPU > > > threads, I/O threads and various other book-keeping/management > > > mechanisms. > > > This is perfectly fine for a large number of reasons and use cases: for > > > instance, running generic VMs, running general purpose Operating > > > systems > > > that need some kind of emulation for legacy boot/hardware etc. > > > > > > What if we wanted to execute a small piece of code as a guest instance, > > > without the involvement of user-space? The KVM functions are already > > > doing > > > what they should: VM and vCPU setup is already part of the kernel, the > > > only > > > missing piece is memory handling. > > > > > > With these series, (a) we expose to the Linux Kernel the bare minimum > > > KVM > > > API functions in order to spawn a guest instance without the > > > intervention > > > of user-space; and (b) we tweak the memory handling code of KVM-related > > > functions to account for another kind of guest, spawned in > > > kernel-space. > > > > > > PATCH #1 exposes the needed stub functions, whereas PATCH #2 introduces > > > the > > > changes in the KVM memory handling code for x86_64 and aarch64. > > > > > > An example of use is provided based on kvmtest.c > > > [http://email.nubificus.co.uk/c/eJwdzU0LgjAAxvFPo0eZm1t62MEkC0xQScJTuBdfcGrpQuvTN4KHP7_bIygSDQfY7mkUXotbzQJQftIX7NI9EtEYofOW3eMJ6uTxTtIqz2B1LPhl-w6nMrc8MNa9ctp_-TzaHWUekxwfSMCRIA3gLvFrQAiGDUNE-MxWtNP6uVootGBsprbJmaQ2ChfdcyVXQ4J97EIDe6G7T8zRIJdJKmde2h_0WTe_] at > > > http://email.nubificus.co.uk/c/eJwljdsKgkAYhJ9GL2X9NQ8Xe2GSBSaoJOFVrOt6QFdL17Sevq1gGPhmGKbERllRtFNb7Hvn9EIKF2Wv6AFNtPmlz33juMbXYAAR3pYwypMY8n1KT-u7O2SJYiJO2l6rf05HrjbYsCihRUEp2DYCgmyH2TowGeiVCS6oPW6EuM-K4SkQSNWtaJbiu5ZA-3EpOzYNrJ8ldk_OBZuFOuHNseTdv9LGqf4Apyg8eg > > Hi Marc, > > thanks for taking the time to check this! > > > > > You don't explain *why* we would want this. What is the overhead of > > having > > a userspace if your guest doesn't need any userspace handling? The > > kvmtest > > example indeed shows that the KVM userspace API is usable without any > > form > > of emulation, hence has almost no cost. > > The rationale behind such an approach is two-fold: > (a) we are able to ditch any user-space involvement in the creation and > spawning of a KVM guest. This is particularly interesting in use-cases > where short-lived tasks are spawned on demand. Think of a scenario where > an ABI compatible binary is loaded in memory. Spawning it as a guest from > userspace would incur a number of IOCTLs. Doing the same from the kernel > would be the same number of IOCTLs but now these are function calls; > additionally, memory handling is kind of simplified. > > (b) I agree that the userspace KVM API is usable without emulation for a > simple task, written in bytecode, adding two registers. But what about > something more complicated? something that needs I/O? for most use-cases, > I/O happens between the guest and some hardware device (network/storage > etc.). Being in the kernel saves us from doing unneccessary mode switches. > Of course there are optimizations for handling I/O on QEMU/KVM VMs > (virtio/vhost), but essentially what happens is removing mode-switches (and > exits) for I/O operations -- is there a good reason not to address that > directly? a guest running in the kernel exits because of an I/O request, > which gets processed and forwarded directly to the relevant subsystem *in* > the kernel (net/block etc.). > > We work on both directions with a particular focus on (a) -- device I/O could > be handled with other mechanisms as well (VFs for instance). > > > Without a clear description of the advantages of your solution, as well > > as a full featured in-tree use case, I find it pretty hard to support > > this. > > Totally understand that -- please keep in mind that this is a first (baby) > step for what we call KVMM (kernel virtual machine monitor). We presented > the architecture at FOSDEM and some preliminary results regarding I/O. Of > course, this is WiP, and far from being upstreamable. Hence the kvmmtest > example showcasing the potential use-case. > > To be honest my main question is whether we are interested in such an > approach in the first place, and then try to work on any rough edges. As > far as I understand, you're not in favor of this approach. The usual answer here is that the kernel is not in favor of adding in-kernel functionality that is not used in the upstream kernel. If you come up with a real use case, and that use case is GPL and has plans for upstreaming, and that use case has a real benefit (dramatically faster than user code could likely be, does something new and useful, etc), then it may well be mergeable.