On Fri, Jan 08, 2016 at 10:53:03AM +0800, Shannon Zhao wrote: > > > On 2016/1/8 4:18, Andrew Jones wrote: > > On Tue, Dec 22, 2015 at 04:08:15PM +0800, Shannon Zhao wrote: > >> From: Shannon Zhao <shannon.zhao@xxxxxxxxxx> > >> > >> Add a new kvm device type KVM_DEV_TYPE_ARM_PMU_V3 for ARM PMU. Implement > >> the kvm_device_ops for it. > >> > >> Signed-off-by: Shannon Zhao <shannon.zhao@xxxxxxxxxx> > >> --- > >> Documentation/virtual/kvm/devices/arm-pmu.txt | 24 +++++ > >> arch/arm64/include/uapi/asm/kvm.h | 4 + > >> include/linux/kvm_host.h | 1 + > >> include/uapi/linux/kvm.h | 2 + > >> virt/kvm/arm/pmu.c | 128 ++++++++++++++++++++++++++ > >> virt/kvm/kvm_main.c | 4 + > >> 6 files changed, 163 insertions(+) > >> create mode 100644 Documentation/virtual/kvm/devices/arm-pmu.txt > >> > >> diff --git a/Documentation/virtual/kvm/devices/arm-pmu.txt b/Documentation/virtual/kvm/devices/arm-pmu.txt > >> new file mode 100644 > >> index 0000000..dda864e > >> --- /dev/null > >> +++ b/Documentation/virtual/kvm/devices/arm-pmu.txt > >> @@ -0,0 +1,24 @@ > >> +ARM Virtual Performance Monitor Unit (vPMU) > >> +=========================================== > >> + > >> +Device types supported: > >> + KVM_DEV_TYPE_ARM_PMU_V3 ARM Performance Monitor Unit v3 > >> + > >> +Instantiate one PMU instance for per VCPU through this API. > > ^^ don't need this 'for' here > > > >> + > >> +Groups: > >> + KVM_DEV_ARM_PMU_GRP_IRQ > >> + Attributes: > >> + The attr field of kvm_device_attr encodes one value: > >> + bits: | 63 .... 32 | 31 .... 0 | > >> + values: | reserved | vcpu_index | > > > > Everywhere else in kvm documentation a vcpu_index is 8 bits. I'm not > > saying that that's good, but expanding it to 32 bits here is > > inconsistent. > Expand it to 32 bits just in case it needs to support more than 256 > vcpus in the future. If you think this is not good, I'll change it to 8 > bits. I agree that eventually we'll want more than 256 vcpus. However the kvm api already has the concept of a vcpu-index, which shows up in two other places, KVM_IRQ_LINE and KVM_DEV_ARM_VGIC_*. In both those other places it's only 8 bits. While there may not be a KVM_VCPU_INDEX_MASK=0xff somewhere, I still think we should keep things consistent. Hmm, or each device should define its own index. Actually, that sounds better, as each device may have a different max-vcpus limit; KVM_IRQ_LINE has 8 bits due to the x86 apic supporting 8 bits, gicv2 could have gotten by with only 3 bits, and gicv3 can use much, much more. When we want more than 256 vcpus we'll need a new kvm-irq-line ioctl, but, as for the arm devices, we could avoid the problem completely by extending the SET/GET_DEVICE_ATTR ioctl to also be a vcpu ioctl. > > (A side note is that I think we should s/vcpu_index/vcpu_id/ > > in order to keep the parameter name consistent with what's used by > > KVM_CREATE_VCPU) > > > Eh, I make it consistent with vGIC which is a kvm device that's also > created via KVM_CREATE_DEVICE API. So they are different names but > express same thing. I knew you are consistent with arm-vgic.txt, and also with KVM_IRQ_LINE. I was just thinking that it'd be nice for everything to be consistent with KVM_CREATE_VCPU. However, now I'm of the mind that the vcpu-id (32 bits) should always be independent of device vcpu-indexes, as we don't want to limit the number of vcpus (by limiting vcpu-ids) to the maximum of the most-limited device we support. > > >> + A value describing the PMU overflow interrupt number for the specified > >> + vcpu_index vcpu. This interrupt could be a PPI or SPI, but for one VM the > >> + interrupt type must be same for each vcpu. As a PPI, the interrupt number is > >> + same for all vcpus, while as a SPI it must be different for each vcpu. > >> + > >> + Errors: > >> + -ENXIO: Unsupported attribute group > > > > Do we need to specify ENXIO here? It's already specified in > > Documentation/virtual/kvm/api.txt for SET/GET_DEVICE_ATTR > > > But specifying it here is not bad, right? If you insist on this, I'll > drop it. It's fine, as the specification in SET/GET_DEVICE_ATTR section isn't going to change, but we require users of this api to read that section already, so I feel it's redundant and prefer to avoid redundancy. I won't insist. > > >> + -EBUSY: The PMU overflow interrupt is already set > >> + -ENODEV: Getting the PMU overflow interrupt number while it's not set > >> + -EINVAL: Invalid vcpu_index or PMU overflow interrupt number supplied > >> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h > >> index 2d4ca4b..cbb9022 100644 > >> --- a/arch/arm64/include/uapi/asm/kvm.h > >> +++ b/arch/arm64/include/uapi/asm/kvm.h > >> @@ -204,6 +204,10 @@ struct kvm_arch_memory_slot { > >> #define KVM_DEV_ARM_VGIC_GRP_CTRL 4 > >> #define KVM_DEV_ARM_VGIC_CTRL_INIT 0 > >> > >> +/* Device Control API: ARM PMU */ > >> +#define KVM_DEV_ARM_PMU_GRP_IRQ 0 > >> +#define KVM_DEV_ARM_PMU_CPUID_MASK 0xffffffffULL > > > > I don't think we should need a mask like this, at least not in the uapi. > > The documentation says to use a vcpu-index [really a vcpu-id], and users > > should know how to convert their vcpu handle (whatever that may be) to > > a vcpu-id using whatever transformation is necessary. They may already > > have a "vcpu id mask" defined (this is one reason why I don't think we > > should use a 32-bit vcpu-index here, instead of the 8-bit vcpu-index used > > everywhere else). Likewise, kvm should know how to do it's transformation. > > Maybe, to aid in the attr field construction, we should supply a shift, > > allowing both sides to do something like > > > > int pmu_attr_extract_vcpu_id(u64 attr) > > { > > return (attr >> KVM_DEV_ARM_PMU_VCPUID_SHIFT) & VCPU_ID_MASK; > So what's the value of the VCPU_ID_MASK? I didn't see any definitions > about it. It looks like KVM_DEV_ARM_PMU_CPUID_MASK(just has a difference > between 32 bits and 8 bits) Whether or not a user would define a VCPU_ID_MASK == 0xff depends on whether or not they understand 'vcpu-index' to be a consistent concept everywhere they see it in the documentation. I now would prefer vcpu-index being device specific. Going that way, then I think the documentation could be updated to help make that clearer, e.g. calling the gic vcpu-index gic_vcpu_index or something. > > > } > > > > u64 pmu_attr_create(int vcpu_id) > > { > > return vcpu_id << KVM_DEV_ARM_PMU_VCPUID_SHIFT; > > } > > > > But, in this case the shift is zero, so it's not really necessary. In > > any case, please add the 'V' for VCPU. > > > >> + > >> /* KVM_IRQ_LINE irq field index values */ > >> #define KVM_ARM_IRQ_TYPE_SHIFT 24 > >> #define KVM_ARM_IRQ_TYPE_MASK 0xff > >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > >> index c923350..608dea6 100644 > >> --- a/include/linux/kvm_host.h > >> +++ b/include/linux/kvm_host.h > >> @@ -1161,6 +1161,7 @@ extern struct kvm_device_ops kvm_mpic_ops; > >> extern struct kvm_device_ops kvm_xics_ops; > >> extern struct kvm_device_ops kvm_arm_vgic_v2_ops; > >> extern struct kvm_device_ops kvm_arm_vgic_v3_ops; > >> +extern struct kvm_device_ops kvm_arm_pmu_ops; > >> > >> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT > >> > >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > >> index 03f3618..4ba6fdd 100644 > >> --- a/include/uapi/linux/kvm.h > >> +++ b/include/uapi/linux/kvm.h > >> @@ -1032,6 +1032,8 @@ enum kvm_device_type { > >> #define KVM_DEV_TYPE_FLIC KVM_DEV_TYPE_FLIC > >> KVM_DEV_TYPE_ARM_VGIC_V3, > >> #define KVM_DEV_TYPE_ARM_VGIC_V3 KVM_DEV_TYPE_ARM_VGIC_V3 > >> + KVM_DEV_TYPE_ARM_PMU_V3, > >> +#define KVM_DEV_TYPE_ARM_PMU_V3 KVM_DEV_TYPE_ARM_PMU_V3 > >> KVM_DEV_TYPE_MAX, > >> }; > >> > >> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c > >> index 3ec3cdd..5518308 100644 > >> --- a/virt/kvm/arm/pmu.c > >> +++ b/virt/kvm/arm/pmu.c > >> @@ -19,6 +19,7 @@ > >> #include <linux/kvm.h> > >> #include <linux/kvm_host.h> > >> #include <linux/perf_event.h> > >> +#include <linux/uaccess.h> > >> #include <asm/kvm_emulate.h> > >> #include <kvm/arm_pmu.h> > >> #include <kvm/arm_vgic.h> > >> @@ -374,3 +375,130 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, > >> > >> pmc->perf_event = event; > >> } > >> + > >> +static inline bool kvm_arm_pmu_initialized(struct kvm_vcpu *vcpu) > >> +{ > >> + return vcpu->arch.pmu.irq_num != -1; > >> +} > >> + > >> +static int kvm_arm_pmu_irq_access(struct kvm *kvm, struct kvm_device_attr *attr, > >> + int *irq, bool is_set) > >> +{ > >> + int cpuid; > > > > please call this vcpu_id > > > >> + struct kvm_vcpu *vcpu; > >> + struct kvm_pmu *pmu; > >> + > >> + cpuid = attr->attr & KVM_DEV_ARM_PMU_CPUID_MASK; > >> + if (cpuid >= atomic_read(&kvm->online_vcpus)) > >> + return -EINVAL; > >> + > >> + vcpu = kvm_get_vcpu(kvm, cpuid); > >> + if (!vcpu) > >> + return -EINVAL; > >> + > >> + pmu = &vcpu->arch.pmu; > >> + if (!is_set) { > >> + if (!kvm_arm_pmu_initialized(vcpu)) > >> + return -ENODEV; > >> + > >> + *irq = pmu->irq_num; > >> + } else { > >> + if (kvm_arm_pmu_initialized(vcpu)) > >> + return -EBUSY; > >> + > >> + kvm_debug("Set kvm ARM PMU irq: %d\n", *irq); > >> + pmu->irq_num = *irq; > >> + } > >> + > >> + return 0; > >> +} > >> + > >> +static int kvm_arm_pmu_create(struct kvm_device *dev, u32 type) > >> +{ > >> + int i; > >> + struct kvm_vcpu *vcpu; > >> + struct kvm *kvm = dev->kvm; > >> + > >> + kvm_for_each_vcpu(i, vcpu, kvm) { > >> + struct kvm_pmu *pmu = &vcpu->arch.pmu; > >> + > >> + memset(pmu, 0, sizeof(*pmu)); > > > > I don't think we want this memset. If we can only create the pmu > > once, then it's unnecessary (we zalloc vcpus). And, if we can > > recreate pmus with this call, then it'll create a memory leak, as > > we'll be zero-ing out all the perf_event pointers, and then won't > > be able to free them on the call to kvm_pmu_vcpu_reset. Naturally > > we need to make sure we're NULL-ing them after each free instead. > > > Ok, will drop this. > > >> + kvm_pmu_vcpu_reset(vcpu); > >> + pmu->irq_num = -1; > >> + } > >> + > >> + return 0; > >> +} > >> + > >> +static void kvm_arm_pmu_destroy(struct kvm_device *dev) > >> +{ > >> + kfree(dev); > >> +} > >> + > >> +static int kvm_arm_pmu_set_attr(struct kvm_device *dev, > >> + struct kvm_device_attr *attr) > >> +{ > >> + switch (attr->group) { > >> + case KVM_DEV_ARM_PMU_GRP_IRQ: { > >> + int __user *uaddr = (int __user *)(long)attr->addr; > >> + int reg; > >> + > >> + if (get_user(reg, uaddr)) > >> + return -EFAULT; > >> + > >> + /* > >> + * The PMU overflow interrupt could be a PPI or SPI, but for one > >> + * VM the interrupt type must be same for each vcpu. As a PPI, > >> + * the interrupt number is same for all vcpus, while as a SPI it > >> + * must be different for each vcpu. > >> + */ > >> + if (reg < VGIC_NR_SGIS || reg >= dev->kvm->arch.vgic.nr_irqs) > >> + return -EINVAL; > >> + > >> + return kvm_arm_pmu_irq_access(dev->kvm, attr, ®, true); > >> + } > >> + } > >> + > >> + return -ENXIO; > >> +} > >> + > >> +static int kvm_arm_pmu_get_attr(struct kvm_device *dev, > >> + struct kvm_device_attr *attr) > >> +{ > >> + int ret; > >> + > >> + switch (attr->group) { > >> + case KVM_DEV_ARM_PMU_GRP_IRQ: { > >> + int __user *uaddr = (int __user *)(long)attr->addr; > >> + int reg = -1; > >> + > >> + > >> + ret = kvm_arm_pmu_irq_access(dev->kvm, attr, ®, false); > >> + if (ret) > >> + return ret; > >> + return put_user(reg, uaddr); > >> + } > >> + } > >> + > >> + return -ENXIO; > >> +} > >> + > >> +static int kvm_arm_pmu_has_attr(struct kvm_device *dev, > >> + struct kvm_device_attr *attr) > >> +{ > >> + switch (attr->group) { > >> + case KVM_DEV_ARM_PMU_GRP_IRQ: > >> + return 0; > >> + } > >> + > >> + return -ENXIO; > >> +} > >> + > >> +struct kvm_device_ops kvm_arm_pmu_ops = { > >> + .name = "kvm-arm-pmu", > >> + .create = kvm_arm_pmu_create, > >> + .destroy = kvm_arm_pmu_destroy, > >> + .set_attr = kvm_arm_pmu_set_attr, > >> + .get_attr = kvm_arm_pmu_get_attr, > >> + .has_attr = kvm_arm_pmu_has_attr, > >> +}; > >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > >> index 484079e..81a42cc 100644 > >> --- a/virt/kvm/kvm_main.c > >> +++ b/virt/kvm/kvm_main.c > >> @@ -2647,6 +2647,10 @@ static struct kvm_device_ops *kvm_device_ops_table[KVM_DEV_TYPE_MAX] = { > >> #ifdef CONFIG_KVM_XICS > >> [KVM_DEV_TYPE_XICS] = &kvm_xics_ops, > >> #endif > >> + > >> +#ifdef CONFIG_KVM_ARM_PMU > >> + [KVM_DEV_TYPE_ARM_PMU_V3] = &kvm_arm_pmu_ops, > > > > Shouldn't we specify 'v3' in the kvm_arm_pmu_ops name, as we do with the > > device type name? > > > Sure, will add. > > >> +#endif > >> }; > >> > >> int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type) > >> -- > >> 2.0.4 > >> Thanks, drew -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html