On 02/27/2010 11:25 AM, Ingo Molnar wrote:
* Zachary Amsden<zamsden@xxxxxxxxxx> wrote:
[...]
Second, it's not over-modularized. The modules are the individual
components of the architecture. How would you propose to put it
differently. They really can't naturally combine. And with the
code quality of qemu in general being problematic by Linux kernel
standards, it's not natural to move the device emulation directly
into the kernel module. So this is why we are where we are today.
I'm not talking about moving it into a kernel _module_ - albeit that
alone is a worthwile thing to do for any performance sensitive hw
component.
I was talking about the option of a clean, stripped down Qemu base
hosted in the kernel proper, in linux/tools/kvm/ or so. If i were
running a virtualization effort it would be the first place i'd
consider to put my tooling into.
Let's ignore the suggestion of hosting it in the kernel. There's no
reason it couldn't be as successful hosted as a separate project.
Let's consider what you would strip of out qemu. You would obviously
pull out TCG and the device emulation that isn't useful for KVM. You
can't compile out TCG today but you actually can compile out most device
emulation so this doesn't actually buy you much. It certainly doesn't
fix any of the problems you outlined.
The GUI wouldn't change at all. You still have the same fundamental
problem that whatever this userspace executable is, is not the place
where you need to implement a user friendly GUI. That has to be a
separate process. Maybe you could integrate that separate process into
the same repository as the core process but we can still do this with qemu.
It would be a no-brainer: most of the devs come from the KVM side, and
KVM itself makes little sense without Qemu, and Qemu makes little sense
without KVM these days. (and i know about the non-KVM and non-x86
roots of Qemu - still, it's not a significant piece of usage today)
Do you have statistics to back this up? You would probably be surprised
at how many people use TCG.
To be honest, every KVM developer including myself has considered and
even prototyped exactly what you described. We've all independently
come to the same conclusion: it's easier to incrementally improve qemu
than it is to split the code base and try to maintain the fork.
And a lot of the other vendors who have decided to fork qemu in the past
have learned the hard way that it's more difficult to maintain a fork
and are now merging back to upstream qemu.
We could certainly make the same argument about forking the kernel to
make it optimized for virtualization. If we took Linux and added it to
the qemu git tree, we would instantly have transparent large page
support for users instead of having to wait years to get it properly
integrated. We could also add gang scheduling and hard scheduler limits
to the kernel. But we know better and even though the process is more
painful and drawn out, we end up with a much better solution in the long
run by including the input and feedback from people like you.
Xen clearly made a different decision and is still suffering the
consequences. They've done the same thing with qemu as you describe and
have now realized it was a mistake and are working to merge their
changes into upstream qemu.
There are *plenty* of usability issues (like transparent large pages)
that need to be addressed at the KVM/kernel level. Today, a user has to
choose between a ~30% decrease in performance on Java workloads or the
ability to overcommit memory. It's a pretty significant problem and
there's been a lot of resistance within the kernel community to fix it.
Likewise, I'm seeing a good number of people hit problems with lock
holder pre-emption in the field. It's absolutely a usability problem
when a user sees catastrophically bad performance running an 8-VCPU
virtual machine on a 2 socket host. Whether it's gang scheduling or
directed yields + pause loop detection, we definitely need some
scheduler changes to fix this problem.
Not having an option enabled by default is an annoyance that a user
eventually overcomes with the help of documentation. Performance
problems are deal breakers that lead users to switch to another
virtualization technology.
Just stripping down qemu and putting the result in the kernel source
tree doesn't fix anything. We have plenty of hard problems to solve
already.
Regards,
Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html