On 25.04.2013, at 11:43, Gleb Natapov wrote: > On Fri, Apr 12, 2013 at 07:08:42PM -0500, Scott Wood wrote: >> Currently, devices that are emulated inside KVM are configured in a >> hardcoded manner based on an assumption that any given architecture >> only has one way to do it. If there's any need to access device state, >> it is done through inflexible one-purpose-only IOCTLs (e.g. >> KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is >> cumbersome and depletes a limited numberspace. >> >> This API provides a mechanism to instantiate a device of a certain >> type, returning an ID that can be used to set/get attributes of the >> device. Attributes may include configuration parameters (e.g. >> register base address), device state, operational commands, etc. It >> is similar to the ONE_REG API, except that it acts on devices rather >> than vcpus. >> >> Both device types and individual attributes can be tested without having >> to create the device or get/set the attribute, without the need for >> separately managing enumerated capabilities. >> >> Signed-off-by: Scott Wood <scottwood@xxxxxxxxxxxxx> >> --- >> v4: >> - Move some boilerplate back into generic code, as requested by Gleb. >> File descriptor management and reference counting is no longer the >> concern of the device implementation. >> >> - Don't hold kvm->lock during create. The original reasons >> for doing so have vanished as for as MPIC is concerned, and >> this avoids needing to answer the question of whether to >> hold the lock during destroy as well. >> >> Paul, you may need to acquire the lock yourself in kvm_create_xics() >> to protect the -EEXIST check. >> >> v3: remove some changes that were merged into this patch by accident, >> and fix the error documentation for KVM_CREATE_DEVICE. >> --- >> Documentation/virtual/kvm/api.txt | 70 ++++++++++++++++ >> Documentation/virtual/kvm/devices/README | 1 + >> include/linux/kvm_host.h | 35 ++++++++ >> include/uapi/linux/kvm.h | 27 +++++++ >> virt/kvm/kvm_main.c | 129 ++++++++++++++++++++++++++++++ >> 5 files changed, 262 insertions(+) >> create mode 100644 Documentation/virtual/kvm/devices/README >> >> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt >> index 976eb65..d52f3f9 100644 >> --- a/Documentation/virtual/kvm/api.txt >> +++ b/Documentation/virtual/kvm/api.txt >> @@ -2173,6 +2173,76 @@ header; first `n_valid' valid entries with contents from the data >> written, then `n_invalid' invalid entries, invalidating any previously >> valid entries found. >> >> +4.79 KVM_CREATE_DEVICE >> + >> +Capability: KVM_CAP_DEVICE_CTRL >> +Type: vm ioctl >> +Parameters: struct kvm_create_device (in/out) >> +Returns: 0 on success, -1 on error >> +Errors: >> + ENODEV: The device type is unknown or unsupported >> + EEXIST: Device already created, and this type of device may not >> + be instantiated multiple times >> + >> + Other error conditions may be defined by individual device types or >> + have their standard meanings. >> + >> +Creates an emulated device in the kernel. The file descriptor returned >> +in fd can be used with KVM_SET/GET/HAS_DEVICE_ATTR. >> + >> +If the KVM_CREATE_DEVICE_TEST flag is set, only test whether the >> +device type is supported (not necessarily whether it can be created >> +in the current vm). >> + >> +Individual devices should not define flags. Attributes should be used >> +for specifying any behavior that is not implied by the device type >> +number. >> + >> +struct kvm_create_device { >> + __u32 type; /* in: KVM_DEV_TYPE_xxx */ >> + __u32 fd; /* out: device handle */ >> + __u32 flags; /* in: KVM_CREATE_DEVICE_xxx */ >> +}; > Should we add __u32 padding here to make struct size multiple of u64? Do you know of any arch that pads structs to u64 boundaries? x86_64 doesn't and ppc64 doesn't either. > >> + >> +4.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR >> + >> +Capability: KVM_CAP_DEVICE_CTRL >> +Type: device ioctl >> +Parameters: struct kvm_device_attr >> +Returns: 0 on success, -1 on error >> +Errors: >> + ENXIO: The group or attribute is unknown/unsupported for this device >> + EPERM: The attribute cannot (currently) be accessed this way >> + (e.g. read-only attribute, or attribute that only makes >> + sense when the device is in a different state) >> + >> + Other error conditions may be defined by individual device types. >> + >> +Gets/sets a specified piece of device configuration and/or state. The >> +semantics are device-specific. See individual device documentation in >> +the "devices" directory. As with ONE_REG, the size of the data >> +transferred is defined by the particular attribute. >> + >> +struct kvm_device_attr { >> + __u32 flags; /* no flags currently defined */ >> + __u32 group; /* device-defined */ >> + __u64 attr; /* group-defined */ >> + __u64 addr; /* userspace address of attr data */ >> +}; >> + >> +4.81 KVM_HAS_DEVICE_ATTR >> + >> +Capability: KVM_CAP_DEVICE_CTRL >> +Type: device ioctl >> +Parameters: struct kvm_device_attr >> +Returns: 0 on success, -1 on error >> +Errors: >> + ENXIO: The group or attribute is unknown/unsupported for this device >> + >> +Tests whether a device supports a particular attribute. A successful >> +return indicates the attribute is implemented. It does not necessarily >> +indicate that the attribute can be read or written in the device's >> +current state. "addr" is ignored. >> >> 4.77 KVM_ARM_VCPU_INIT >> >> diff --git a/Documentation/virtual/kvm/devices/README b/Documentation/virtual/kvm/devices/README >> new file mode 100644 >> index 0000000..34a6983 >> --- /dev/null >> +++ b/Documentation/virtual/kvm/devices/README >> @@ -0,0 +1 @@ >> +This directory contains specific device bindings for KVM_CAP_DEVICE_CTRL. >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h >> index 20d77d2..8fce9bc 100644 >> --- a/include/linux/kvm_host.h >> +++ b/include/linux/kvm_host.h >> @@ -1063,6 +1063,41 @@ static inline bool kvm_check_request(int req, struct kvm_vcpu *vcpu) >> >> extern bool kvm_rebooting; >> >> +struct kvm_device_ops; >> + >> +struct kvm_device { >> + struct kvm_device_ops *ops; >> + struct kvm *kvm; >> + atomic_t users; >> + void *private; >> +}; >> + >> +/* create, destroy, and name are mandatory */ >> +struct kvm_device_ops { >> + const char *name; >> + int (*create)(struct kvm_device *dev, u32 type); >> + >> + /* >> + * Destroy is responsible for freeing dev. >> + * >> + * Destroy may be called before or after destructors are called >> + * on emulated I/O regions, depending on whether a reference is >> + * held by a vcpu or other kvm component that gets destroyed >> + * after the emulated I/O. >> + */ >> + void (*destroy)(struct kvm_device *dev); >> + >> + int (*set_attr)(struct kvm_device *dev, struct kvm_device_attr *attr); >> + int (*get_attr)(struct kvm_device *dev, struct kvm_device_attr *attr); >> + int (*has_attr)(struct kvm_device *dev, struct kvm_device_attr *attr); >> + long (*ioctl)(struct kvm_device *dev, unsigned int ioctl, >> + unsigned long arg); >> +}; >> + >> +void kvm_device_get(struct kvm_device *dev); >> +void kvm_device_put(struct kvm_device *dev); >> +struct kvm_device *kvm_device_from_filp(struct file *filp); >> + >> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT >> >> static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val) >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >> index 74d0ff3..20ce2d2 100644 >> --- a/include/uapi/linux/kvm.h >> +++ b/include/uapi/linux/kvm.h >> @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info { >> #define KVM_CAP_PPC_EPR 86 >> #define KVM_CAP_ARM_PSCI 87 >> #define KVM_CAP_ARM_SET_DEVICE_ADDR 88 >> +#define KVM_CAP_DEVICE_CTRL 89 >> >> #ifdef KVM_CAP_IRQ_ROUTING >> >> @@ -909,6 +910,32 @@ struct kvm_s390_ucas_mapping { >> #define KVM_ARM_SET_DEVICE_ADDR _IOW(KVMIO, 0xab, struct kvm_arm_device_addr) >> >> /* >> + * Device control API, available with KVM_CAP_DEVICE_CTRL >> + */ >> +#define KVM_CREATE_DEVICE_TEST 1 >> + >> +struct kvm_create_device { >> + __u32 type; /* in: KVM_DEV_TYPE_xxx */ >> + __u32 fd; /* out: device handle */ >> + __u32 flags; /* in: KVM_CREATE_DEVICE_xxx */ >> +}; >> + >> +struct kvm_device_attr { >> + __u32 flags; /* no flags currently defined */ >> + __u32 group; /* device-defined */ >> + __u64 attr; /* group-defined */ >> + __u64 addr; /* userspace address of attr data */ >> +}; > Please move struct definitions and KVM_CREATE_DEVICE_TEST define out > from ioctl definition block. Let me change that in my tree... > >> + >> +/* ioctl for vm fd */ >> +#define KVM_CREATE_DEVICE _IOWR(KVMIO, 0xe0, struct kvm_create_device) >> + >> +/* ioctls for fds returned by KVM_CREATE_DEVICE */ >> +#define KVM_SET_DEVICE_ATTR _IOW(KVMIO, 0xe1, struct kvm_device_attr) >> +#define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr) >> +#define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr) >> + >> +/* >> * ioctls for vcpu fds >> */ >> #define KVM_RUN _IO(KVMIO, 0x80) >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index 5cc53c9..e2b18af 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -2158,6 +2158,117 @@ out: >> } >> #endif >> >> +static int kvm_device_ioctl_attr(struct kvm_device *dev, >> + int (*accessor)(struct kvm_device *dev, >> + struct kvm_device_attr *attr), >> + unsigned long arg) >> +{ >> + struct kvm_device_attr attr; >> + >> + if (!accessor) >> + return -EPERM; >> + >> + if (copy_from_user(&attr, (void __user *)arg, sizeof(attr))) >> + return -EFAULT; >> + >> + return accessor(dev, &attr); >> +} >> + >> +static long kvm_device_ioctl(struct file *filp, unsigned int ioctl, >> + unsigned long arg) >> +{ >> + struct kvm_device *dev = filp->private_data; >> + >> + switch (ioctl) { >> + case KVM_SET_DEVICE_ATTR: >> + return kvm_device_ioctl_attr(dev, dev->ops->set_attr, arg); >> + case KVM_GET_DEVICE_ATTR: >> + return kvm_device_ioctl_attr(dev, dev->ops->get_attr, arg); >> + case KVM_HAS_DEVICE_ATTR: >> + return kvm_device_ioctl_attr(dev, dev->ops->has_attr, arg); >> + default: >> + if (dev->ops->ioctl) >> + return dev->ops->ioctl(dev, ioctl, arg); >> + >> + return -ENOTTY; >> + } >> +} >> + >> +void kvm_device_get(struct kvm_device *dev) >> +{ >> + atomic_inc(&dev->users); >> +} >> + >> +void kvm_device_put(struct kvm_device *dev) >> +{ >> + if (atomic_dec_and_test(&dev->users)) >> + dev->ops->destroy(dev); >> +} >> + >> +static int kvm_device_release(struct inode *inode, struct file *filp) >> +{ >> + struct kvm_device *dev = filp->private_data; >> + struct kvm *kvm = dev->kvm; >> + >> + kvm_device_put(dev); >> + kvm_put_kvm(kvm); > We may put kvm only if users goes to zero, otherwise kvm can be > freed while something holds a reference to a device. Why not make > kvm_device_put() do it? Nice catch. I'll change the patch so it does the kvm_put_kvm inside kvm_device_put's destroy branch. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html