On Wed, Feb 20, 2013 at 05:20:50PM -0600, Scott Wood wrote: > On 02/20/2013 03:17:27 PM, Marcelo Tosatti wrote: > >On Mon, Feb 18, 2013 at 05:01:40PM -0600, Scott Wood wrote: > >> On 02/18/2013 06:21:59 AM, Gleb Natapov wrote: > >> >Copying Christoffer since ARM has in kernel irq chip too. > >> > > >> >On Wed, Feb 13, 2013 at 11:49:15PM -0600, Scott Wood wrote: > >> >> Currently, devices that are emulated inside KVM are > >configured in a > >> >> hardcoded manner based on an assumption that any given > >architecture > >> >> only has one way to do it. If there's any need to access device > >> >state, > >> >> it is done through inflexible one-purpose-only IOCTLs (e.g. > >> >> KVM_GET/SET_LAPIC). Defining new IOCTLs for every little > >thing is > >> >> cumbersome and depletes a limited numberspace. > > > >Creation of ioctl has advantages. It makes explicit what the > >data contains and how it should be interpreted. > >This means that for example, policy control can be performed at ioctl > >level (can group, by ioctl number, which ioctl calls are allowed after > >VM creation, for example). > > > >It also makes it explicit that its a userspace interface which > >applications depend. > > > >With a single 'set device attribute' ioctl its more difficult > >to do that. > > I don't see why creating ioctls rather than device attributes (or > whatever you want to call them) differs this way. Device attributes > are inherently userspace interfaces, just as ioctls are. What the > data contains and how it is interpreted are documented, albeit in a > more lightweight format than KVM usually uses for ioctls. > > An ioctl is no guarantee of good documentation. KVM is far better > than most of the kernel in that regard, but even there some ioctls > are missing (e.g. KVM_IRQ_LINE_STATUS as mentioned elsewhere in this > thread), and some others are inadequately explained (e.g. IRQ > routing). > > By "policy control", do you just mean that some ioctls act on a > /dev/kvm fd, others on a vm fd, and others on a vcpu fd? This is > pretty similar to having a device fd, except for the mechanism used. No, i mean policy control acting on ioctl numbers. Its essentially the same issue as with 'strace' (in that the its necessary to parse the ioctl data further). > The main things that I dislike about adding new things at the ioctl > level are: > - limited numberspace, and more prone to merge conflicts than a > device-specific numberspace Can reserve per-architecture ranges to deal with that. > - having to add a new pathway to get into the driver for each ioctl, > rather than having all operations on a particular device go to the > right place, and the device implementation interprets further > (this assumes that we're talking about vm ioctls, and not returning > a > new fd for the device) Agree with that point. > - awkward way of negotiating capabilities with userspace (have to > declare the capability id separately, and add code somewhere outside > the > device implementation to respond to capability requests) Agree with that point. > - api.txt formatting that imposes yet another document-internal > numberspace to conflict on :-) The main problem, to me, is that the interface allows flexibility but the details of using such flexibility are not worked out. That is, lack of definitions such as when setting attributes is allowed. With a big blog, that information is implicit: a SET operation is a full device reset. With individual attributes: - Which attributes can be set individually? - Is there an order which must be respected? - Locking. - End result: more logic/code necessary. > >Abstracting the details also makes it cumbersome to read > >strace output :-) > > You'd have to update strace one way or another if you want > pretty-printed output. Having a more restricted API than arbitrary > ioctls could actually improve the situation -- this could be a good > reason for including the attribute length as an explicit parameter > rather than being implicit in the attribute id. Then you'd just > teach strace about the device control API once, and not for each new > attribute or device that gets added. > > >> >> This API provides a mechanism to instantiate a device of a > >certain > >> >> type, returning an ID that can be used to set/get attributes > >of the > >> >> device. Attributes may include configuration parameters (e.g. > >> >> register base address), device state, operational commands, > >etc. It > >> >> is similar to the ONE_REG API, except that it acts on devices > >rather > >> >> than vcpus. > >> >You are not only provide different way to create in kernel irq > >> >chip you > >> >also use an alternate way to trigger interrupt lines. Before going > >> >into > >> >interface specifics lets think about whether it is really worth it? > >> > >> Which "it" do you mean here? > >> > >> The ability to set/get attributes is needed. Sorry, but "get or set > >> one blob of data, up to 512 bytes, for the entire irqchip" is just > >> not good enough -- assuming you don't want us to start sticking > >> pointers and commands in *that* data. :-) > > > >Why not? Is it necessary to constantly read/write attributes? > > It's not about how often we do it, but how much state we have, > especially if we ever want to implement migration. Migration reads/writes the device state once, which is supposedly much smaller than VM's RAM, so can't see the logic here. > >> If you mean the way to inject interrupts, it's simpler this way. > > > >Are you injecting interrupts via this new SET_DEVICE_ATTRIBUTE ioctl? > > Yes, but even if that gets shot down (the best objection IMHO is the > one nobody is raising -- how to hook into irqfd), we still need the > rest of it. Why are individual attributes necessary ? Still unclear. How about dropping it? And then assume full blob write is a device reset. > I think we'd even want to keep attributes for IRQ line > status so that we have a way to read it, even if just for debugging. No problem reading full blob on that case. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html