On Thu, May 04, 2017 at 11:16:57AM +0100, Marc Zyngier wrote: > On 04/05/17 10:44, Christoffer Dall wrote: > > On Thu, May 04, 2017 at 10:05:43AM +0100, Marc Zyngier wrote: > >> On 04/05/17 09:38, Christoffer Dall wrote: > >>> On Thu, May 04, 2017 at 09:28:50AM +0100, Marc Zyngier wrote: > >>>> On 04/05/17 09:13, Christoffer Dall wrote: > >>>>> On Thu, May 04, 2017 at 09:09:47AM +0100, Marc Zyngier wrote: > >>>>>> On 03/05/17 19:32, Christoffer Dall wrote: > >>>>>>> Since we got support for devices in userspace which allows reporting the > >>>>>>> PMU overflow output status to userspace, we should actually allow > >>>>>>> creating the PMU on systems without an in-kernel irqchip, which in turn > >>>>>>> requires us to slightly clarify error codes for the ABI and move things > >>>>>>> around for the initialization phase. > >>>>>>> > >>>>>>> Signed-off-by: Christoffer Dall <cdall@xxxxxxxxxx> > >>>>>>> --- > >>>>>>> Documentation/virtual/kvm/devices/vcpu.txt | 16 +++++++++------- > >>>>>>> virt/kvm/arm/pmu.c | 27 +++++++++++++++++---------- > >>>>>>> 2 files changed, 26 insertions(+), 17 deletions(-) > >>>>>>> > >>>>>>> diff --git a/Documentation/virtual/kvm/devices/vcpu.txt b/Documentation/virtual/kvm/devices/vcpu.txt > >>>>>>> index 02f5068..352af6e 100644 > >>>>>>> --- a/Documentation/virtual/kvm/devices/vcpu.txt > >>>>>>> +++ b/Documentation/virtual/kvm/devices/vcpu.txt > >>>>>>> @@ -16,7 +16,9 @@ Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a > >>>>>>> Returns: -EBUSY: The PMU overflow interrupt is already set > >>>>>>> -ENXIO: The overflow interrupt not set when attempting to get it > >>>>>>> -ENODEV: PMUv3 not supported > >>>>>>> - -EINVAL: Invalid PMU overflow interrupt number supplied > >>>>>>> + -EINVAL: Invalid PMU overflow interrupt number supplied or > >>>>>>> + trying to set the IRQ number without using an in-kernel > >>>>>>> + irqchip. > >>>>>>> > >>>>>>> A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt > >>>>>>> number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt > >>>>>>> @@ -25,11 +27,11 @@ all vcpus, while as an SPI it must be a separate number per vcpu. > >>>>>>> > >>>>>>> 1.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT > >>>>>>> Parameters: no additional parameter in kvm_device_attr.addr > >>>>>>> -Returns: -ENODEV: PMUv3 not supported > >>>>>>> - -ENXIO: PMUv3 not properly configured as required prior to calling this > >>>>>>> - attribute > >>>>>>> +Returns: -ENODEV: PMUv3 not supported or GIC not initialized > >>>>>>> + -ENXIO: PMUv3 not properly configured or in-kernel irqchip not > >>>>>>> + conigured as required prior to calling this attribute > >>>>>>> -EBUSY: PMUv3 already initialized > >>>>>>> > >>>>>>> -Request the initialization of the PMUv3. This must be done after creating the > >>>>>>> -in-kernel irqchip. Creating a PMU with a userspace irqchip is currently not > >>>>>>> -supported. > >>>>>>> +Request the initialization of the PMUv3. If using the PMUv3 with an in-kernel > >>>>>>> +virtual GIC implementation, this must be done after initializing the in-kernel > >>>>>>> +irqchip. > >>>>>>> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c > >>>>>>> index 4b43e7f..f046b08 100644 > >>>>>>> --- a/virt/kvm/arm/pmu.c > >>>>>>> +++ b/virt/kvm/arm/pmu.c > >>>>>>> @@ -456,21 +456,25 @@ static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu) > >>>>>>> if (!kvm_arm_support_pmu_v3()) > >>>>>>> return -ENODEV; > >>>>>>> > >>>>>>> - /* > >>>>>>> - * We currently require an in-kernel VGIC to use the PMU emulation, > >>>>>>> - * because we do not support forwarding PMU overflow interrupts to > >>>>>>> - * userspace yet. > >>>>>>> - */ > >>>>>>> - if (!irqchip_in_kernel(vcpu->kvm) || !vgic_initialized(vcpu->kvm)) > >>>>>>> - return -ENODEV; > >>>>>>> - > >>>>>>> - if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features) || > >>>>>>> - !kvm_arm_pmu_irq_initialized(vcpu)) > >>>>>>> + if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features)) > >>>>>>> return -ENXIO; > >>>>>>> > >>>>>>> if (kvm_arm_pmu_v3_ready(vcpu)) > >>>>>>> return -EBUSY; > >>>>>>> > >>>>>>> + if (irqchip_in_kernel(vcpu->kvm)) { > >>>>>>> + /* > >>>>>>> + * If using the PMU with an in-kernel virtual GIC > >>>>>>> + * implementation, we require the GIC to be already > >>>>>>> + * initialized when initializing the PMU. > >>>>>>> + */ > >>>>>>> + if (!vgic_initialized(vcpu->kvm)) > >>>>>>> + return -ENODEV; > >>>>>>> + > >>>>>>> + if (!kvm_arm_pmu_irq_initialized(vcpu)) > >>>>>>> + return -ENXIO; > >>>>>>> + } > >>>>>>> + > >>>>>>> kvm_pmu_vcpu_reset(vcpu); > >>>>>>> vcpu->arch.pmu.ready = true; > >>>>>>> > >>>>>>> @@ -512,6 +516,9 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) > >>>>>>> int __user *uaddr = (int __user *)(long)attr->addr; > >>>>>>> int irq; > >>>>>>> > >>>>>>> + if (!irqchip_in_kernel(vcpu->kvm)) > >>>>>>> + return -EINVAL; > >>>>>>> + > >>>>>> > >>>>>> Shouldn't we fail the same way for {get,has}_attr? get_attr is going to > >>>>>> generate a -ENXIO, and has_attr is going to lie about the availability > >>>>>> of KVM_ARM_VCPU_PMU_V3_IRQ... > >>>>>> > >>>>> > >>>>> Here's the text from api.txt: > >>>>> > >>>>> Tests whether a device supports a particular attribute. A successful > >>>>> return indicates the attribute is implemented. It does not necessarily > >>>>> indicate that the attribute can be read or written in the device's > >>>>> current state. "addr" is ignored. > >>>>> > >>>>> My interpretation therefore is that QEMU can use this ioctl to figure > >>>>> out if the feature is supported (sort of like a capability), but that > >>>>> doesn't mean that the configuration of the VM is such that the attribute > >>>>> can be get or set at that moment. > >>>>> > >>>>> For example, there will also alway be situations where you can get an > >>>>> attr, but not set an attr, what should the has_attr return then? > >>>> > >>>> My issue here is that whether we can get/set the interrupt or not is not > >>>> a function of the device itself, but of the way it is "wired". No matter > >>>> what "the device's current state" is, we'll never be able to get/set the > >>>> interrupt. > >>>> > >>>> I'd tend to err on the side of caution and return something that is > >>>> unambiguous, be maybe I have too strict an interpretation of the API. > >>>> > >>> > >>> Hmm, I see the has_attr as a method for userspace to discover "does this > >>> kernel have this feature". If we make has_attr return a value depending > >>> on the VM having an in-kernel gic or not, we implicitly require > >>> userspace to create a VM with an in-kernel GIC to discover if this > >>> kernel has that feature, and therefore also impose an ordering > >>> requirement of figuring out the capabilities of the kernel (i.e. create > >>> the GIC before checking this API). > >>> > >>> I think QEMU should be able to do: > >>> > >>> if (has_attr()) { > >>> kvm_supports_set_timer_irq = true; > >>> vtimer_irq = foo; > >>> } else { > >>> kvm_supports_set_timer_irq = false; > >>> vtimer_irq = 27; /* default, we're stuck with it */ > >>> } > >>> > >>> create_board_definition(); > >>> create_dt(); > >>> create_acpi(); > >>> > >>> /* do whatever */ > >>> > >>> if (kvm_supports_set_timer_irq && kvm_irqchip_in_kernel()) { > >>> kvm_arm_timer_set_irq(...); > >>> } > >>> > >>> And all this should not be coupled to when we create the irqchip device. > >>> > >>> But I may be failing to see the case where the current implementation > >>> creates a problem for userspace, in which case we should document the > >>> ordering requirement. > >> > >> I'm not sure it would create any problem, at least not for the PMU > >> (there is no working code that would have created a PMU without an irqchip). > >> > >> It is just that we have a slightly diverging interpretation of what > >> has_attr means. You see it as "attribute that the device supports", and > >> I see it as "attribute that the device supports in this configuration". > >> I'm happy to use your semantics from now on. > > > > In either case, we should make sure this is clear in the ABI. I thought > > that the "It does not necessarily indicate that the attribute can be > > read or written in the device's current state." implied my > > interpretation, but maybe I'm missing some subtlety there? > > Well, that's what I said above. The interrupt number is not a function > of the device state, but one of the integration of that device in the > system. The PMU itself is not concerned with an interrupt number, only > the GIC is. > > > Do you think we should clarify the API? > > Not sure. I admit my interpretation is a bit borderline. If I was brave, > I'd say that all the interrupt numbers should be a property of the GIC, > and not of the device. But that ship has sailed a while ago, and I'm > weak ;-). > > More seriously, I don't think this is likely to cause any issue. > ok, I'll leave it as is then. > > > > By the way, I now realize that we are not maintining the same > > understanding between get/set_attr, which I really think we should, so > > I'll add the following: > > > > > > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c > > index 9b3e3ea..0cf62b7 100644 > > --- a/virt/kvm/arm/pmu.c > > +++ b/virt/kvm/arm/pmu.c > > @@ -551,6 +551,9 @@ int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) > > int __user *uaddr = (int __user *)(long)attr->addr; > > int irq; > > > > + if (!irqchip_in_kernel(vcpu->kvm)) > > + return -EINVAL; > > + > > if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features)) > > return -ENODEV; > > > > Looks good to me. > Thanks, -Christoffer