Re: [Qemu-devel] Re: [PATCH 26/35] kvm: Eliminate KVMState arguments

Avi Kivity <avi@xxxxxxxxxx> · Tue, 11 Jan 2011 11:17:39 +0200

On 01/10/2011 10:11 PM, Anthony Liguori wrote:
On 01/08/2011 02:47 AM, Jan Kiszka wrote:
OK, but I don't want to argue about the ioeventfd API. So let's put this
case aside. :)

I often reply too quickly without explaining myself.  Let me use 
ioeventfd as an example to highlight why KVMState is a good thing.

In real life, PIO and MMIO are never directly communicated to the 
device from the processor.  Instead, they go through a series of other 
devices.  In the case of something like an ISA device, a PIO first 
goes to the chipset into the PCI complex, it will then go through a 
PCI-to-ISA bridge via subtractive decoding, and then forward over the 
ISA device where it will be interpreted by some device.

The path to the chipset may be shared among different processors but 
it may also be unique.  The APIC is the best example as there are 
historic APICs that hung directly off of the CPUs such that the same 
MMIO access across different CPUs did not go to the same device.  This 
is why the APIC emulation in QEMU is so weird because we don't model 
this behavior correctly.

This means that a PIO operation needs to flow from a CPUState to a 
DeviceState.  It can then flow through to another DeviceState until 
it's finally handled.

The first problem with ioeventfd is that it's a per-VM operation.  It 
should be per VCPU.

Just consider ioeventfd as something that hooks the system bus, not the 
processor-chipset link, and the problem goes away.  In practice, any 
per-cpu io port (for SMM or power management) would need synchronous 
handling; an eventfd isn't a suitable way to communicate it.

But even if this were the case, the path that a PIO operation takes 
should not be impacted by ioeventfd.  IOW, a device shouldn't be 
allocating an eventfd() and handing it to a magical KVM call.  
Instead, a device should register a callback for a particular port in 
the same way it always does.  *As an optimization*, we should have 
another interface that says that these values are only valid for this 
IO port.  That would let us create eventfds and register things behind 
the scenes.

The semantics are different.  The normal callbacks are synchronous; the 
vcpu is halted until the callback is serviced.  For most callbacks, this 
is critical, not just per-cpu things like vmport (example: cirrus back 
switching).

I agree it shouldn't be done by a kvm-specific kvm call, but instead by 
an API, but that API needs to be explicitly asynchronous.  When we 
further thread qemu, we'd also need to specify which thread is to 
execute the callback (and the implementation would add the eventfd to 
the thread's fd poll list).

That means we can handle TCG, older KVM kernels, and newer KVM kernels 
without any special support in the device model.  It also means that 
the device models never have to worry about KVMState because there's 
an entirely different piece of code that's consulting the set of 
special ports and then deciding how to handle it.  The result is 
better, more portable code that doesn't have KVM-isms.

Yes.

If passing state around seems to be ugly, it's probably because we're 
not abstracting things correctly.  Removing the state and making it 
implicit is the wrong solution. 

I agree with the general sentiment that utilizing the fact that a 
variable is global to make it implicit is bad from a software 
engineering point of view.  By restricting access to variables and 
functions, you can enforce modularity on the code.  Much like the 
private: specifier in C++ and other languages.

Fixing the abstraction is the right solution (or living with the 
ugliness until someone else is motivated to fix it properly).

And with that too, especially the parenthesized statement.  We have 
qemu-kvm that is overly pragmatic and trie[sd] not to avoid modifying 
qemu as much as possible.  We have the qemu.git kvm implementation that 
tries a perfectionist approach that failed because most of the users 
need the featured and tested pragmatic approach.  The two mix like oil 
and water.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html