Re: [RFC PATCH 1/6] kvm: add device control API

Scott Wood <scottwood@xxxxxxxxxxxxx> · Tue, 19 Feb 2013 15:16:37 -0600

On 02/19/2013 06:24:18 AM, Gleb Natapov wrote:
On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote:
> The ability to set/get attributes is needed.  Sorry, but "get or set
> one blob of data, up to 512 bytes, for the entire irqchip" is just
> not good enough -- assuming you don't want us to start sticking
> pointers and commands in *that* data. :-)
>
Proposed interface sticks pointers into ioctl data, so why doing the  
same
for KVM_SET_IRQCHIP/KVM_GET_IRQCHIP makes you smile.

There's a difference between putting a pointer in an ioctl control  
structure that is specifically documented as being that way (as in  
ONE_REG), versus taking an ioctl that claims to be setting/getting a  
blob of state and embedding pointers in it.  It would be like sticking  
a pointer in the attribute payload of this API, which I think is  
something to be discouraged.  It'd also be using KVM_SET_IRQCHIP to  
read data, which is the sort of thing you object to later on regarding  
KVM_IRQ_LINE_STATUS.

Then there's the silliness of transporting 512 bytes just to read a  
descriptor for transporting something else.

For signaling irqs (I think this is what you mean by "commands") we  
have KVM_IRQ_LINE.

It's one type of command.  Another is setting the address.  Another is  
writing to registers that have side effects (this is how MSI injection  
is done on MPIC, just as in real hardware).

What is the benefit of KVM_IRQ_LINE over what MPIC does?  What real  
(non-glue/wrapper) code can become common?

And I really hope you don't want us to do MSIs the x86 way.

In the XICS thread, Paul brought up the possibliity of cascaded MPICs.   
It's not relevant to the systems we're trying to model, but if one did  
want to use the in-kernel irqchip interface for that, it would be  
really nice to be able to operate on a specific MPIC for injection  
rather than have to come up with some sort of global identifier (above  
and beyond the minor flattening we'd need to do to represent a single  
MPIC's interrupts in a flat numberspace).

> If you mean the way to inject interrupts, it's simpler this way.
> Why go out of our way to inject common glue code into a
> communication path between hw/kvm/mpic.c in QEMU and
> arch/powerpc/kvm/mpic.c in KVM?  Or rather, why make that common
> glue be specific to this one function when we could reuse the same
> communication glue used for other things, such as device state?
You will need glue anyway and I do no see how amount of it is much
different one way or the other.

It uses glue that we need to be present for other things anyway.  If it  
weren't for XICS we wouldn't need a KVM_IRQ_LINE implementation at all  
on PPC.  It may not be a major difference, but it doesn't affect  
anything but MPIC and it seems more straightforward this way.

Gluing qemu_set_irq() to
ioctl(KVM_IRQ_LINE) or ioctl(KVM_SET_DEVICE_ATTR) is not much  
different.

qemu_set_irq() is not glued to either of those.  It's glued to  
kvm_openpic_set_irq(), kvm_ioapic_set_irq(), etc.  It's already not  
generic code.

Of course, since the interface you propose is not irq chip specific

This part of it is.

we need non irq chip specific way to talk to it. But how do you  
propose
to model things like KVM_IRQ_LINE_STATUS with KVM_SET_DEVICE_ATTR?

That one's not even in api.txt, so could you explain what exactly it's  
supposed to return, and why it's needed?

AFAICT, the only thing it gets used for in QEMU is coalescing  
mc146818rtc interrupts.

Could an error return be used for cases where the IRQ was not  
delivered, in the very unlikely event that we want to implement  
something similar on MPIC?  Note again that MPIC's decision to use or  
not use KVM_IRQ_LINE is only about what MPIC does; it is not inherent  
in the device control API.

KVM_SET_DEVICE_ATTR needs to return data back and getting data back  
from
"set" ioctl is strange.

If we really need a single atomic operation to both read and write,  
beyond returning error values, then yes, that would be a new ioctl.  It  
could be added in the future if needed.

Other devices may get other commands that need
response, so if we design generic interface we should take it into
account. I think using KVM_SET_DEVICE_ATTR to inject interrupts is a
misnomer, you do not set internal device attribute, you toggle  
external
input. My be another ioctl KVM_SEND_DEVICE_COMMAND is needed.

I see no need for a separate ioctl in terms of the underlying  
infrastructure for distinguishing "attribute" from "write-only  
command".  I'm open to improvements on what the ioctl is called.  It's  
basically like setting a register on a device, except I was concerned  
that if we actually called it a "register" that people would take it  
too literally and think it's only for the architected register state of  
the emulated device.

> >ARM vGIC code, that is ready to go upstream, uses old way too. So
> >it will
> >be 2 archs against one.
>
> I wasn't aware that that's how it worked. :-P
>
What worked? That vGIC uses existing interface or that non generic
interface used by many arches wins generic one used by only one arch?

The latter.  Two wrongs don't make a right, and adding another  
inextensible, device-specific API is not the answer to the existing  
APIs being too inextensible and device/arch-specific.  Some portion  
will always need to be device-specific because we're controlling the  
creation and of a specific device, but the glue does not need to be.

APIs are easy to add and impossible to remove.

That's why I want to get it right this time.

-Scott
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html