Hej, On 06/04/16 15:23, Christoffer Dall wrote: > On Tue, Apr 05, 2016 at 06:28:55PM +0100, Andre Przywara wrote: >> Hi, >> >> On 29/03/16 22:16, Christoffer Dall wrote: >>> On Fri, Mar 25, 2016 at 02:04:29AM +0000, Andre Przywara wrote: >>>> From: Christoffer Dall <christoffer.dall@xxxxxxxxxx> >>>> >>>> Provide a vgic_queue_irq() function which decides whether a given >>>> IRQ needs to be queued to a VCPU's ap_list. >>>> This should be called whenever an IRQ became pending or got enabled, >>> >>> becomes pending or enabled, >>> >>>> either as a result of userspace injection, from in-kernel emulated >>>> devices like the architected timer or from MMIO accesses to the >>>> distributor emulation. >>>> Also provides the necessary functions to allow userland to inject an >>>> IRQ to a guest. >>> >>> Since this is the first code that starts using our locking mechanism, we >>> add some (hopefully) clear documentation of our locking strategy and >>> requirements along with this patch. >>> >>>> [Andre: refactor out vgic_queue_irq()] >>>> >>>> Signed-off-by: Christoffer Dall <christoffer.dall@xxxxxxxxxx> >>>> Signed-off-by: Andre Przywara <andre.przywara@xxxxxxx> >>>> --- >>>> include/kvm/vgic/vgic.h | 3 + >>>> virt/kvm/arm/vgic/vgic.c | 181 +++++++++++++++++++++++++++++++++++++++++++++++ >>>> virt/kvm/arm/vgic/vgic.h | 1 + >>>> 3 files changed, 185 insertions(+) >>>> >>>> diff --git a/include/kvm/vgic/vgic.h b/include/kvm/vgic/vgic.h >>>> index 659f8b1..f32b284 100644 >>>> --- a/include/kvm/vgic/vgic.h >>>> +++ b/include/kvm/vgic/vgic.h >>>> @@ -178,6 +178,9 @@ struct vgic_cpu { >>>> struct list_head ap_list_head; >>>> }; >>>> >>>> +int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid, >>>> + bool level); >>>> + >>>> #define irqchip_in_kernel(k) (!!((k)->arch.vgic.in_kernel)) >>>> #define vgic_initialized(k) (false) >>>> #define vgic_ready(k) ((k)->arch.vgic.ready) >>>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c >>>> index 8e34916..a95aabc 100644 >>>> --- a/virt/kvm/arm/vgic/vgic.c >>>> +++ b/virt/kvm/arm/vgic/vgic.c >>>> @@ -19,8 +19,25 @@ >>>> >>>> #include "vgic.h" >>>> >>>> +#define CREATE_TRACE_POINTS >>>> +#include "../trace.h" >>>> + >>>> struct vgic_global kvm_vgic_global_state; >>>> >>>> +/* >>>> + * Locking order is always: >>>> + * vgic_cpu->ap_list_lock >>>> + * vgic_irq->irq_lock >>>> + * >>>> + * (that is, always take the ap_list_lock before the struct vgic_irq lock). >>>> + * >>>> + * When taking more than one ap_list_lock at the same time, always take the >>>> + * lowest numbered VCPU's ap_list_lock first, so: >>>> + * vcpuX->vcpu_id < vcpuY->vcpu_id: >>>> + * spin_lock(vcpuX->arch.vgic_cpu.ap_list_lock); >>>> + * spin_lock(vcpuY->arch.vgic_cpu.ap_list_lock); >>>> + */ >>>> + >>>> struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu, >>>> u32 intid) >>>> { >>>> @@ -39,3 +56,167 @@ struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu, >>>> WARN(1, "Looking up struct vgic_irq for reserved INTID"); >>>> return NULL; >>>> } >>>> + >>>> +/** >>>> + * kvm_vgic_target_oracle - compute the target vcpu for an irq >>>> + * >>>> + * @irq: The irq to route. Must be already locked. >> >> ^^^^^^^^^^^^^^^^^^^^^^ >> >>>> + * >>>> + * Based on the current state of the interrupt (enabled, pending, >>>> + * active, vcpu and target_vcpu), compute the next vcpu this should be >>>> + * given to. Return NULL if this shouldn't be injected at all. >>>> + */ >>>> +static struct kvm_vcpu *vgic_target_oracle(struct vgic_irq *irq) >>>> +{ >>>> + /* If the interrupt is active, it must stay on the current vcpu */ >>>> + if (irq->active) >>>> + return irq->vcpu; >>> >>> we are not taking a lock here. What are the locking expectations? If >>> the expectarions are that the IRQ is locked when calling this function, >>> can we have a BIG FAT COMMENT saying that then? >> >> Do you mean really BIG FAT or is the above sufficient? (I guess not). >> I will make it more prominent. > > well, maybe it doesn't have to be BIG FAT. But I did miss the existing > comment. I think it would be preferred to have a separate paragraph > explaining the locking expectaions, but perhaps I'm just > being stupid. Fixed - not you being stupid - which you clearly aren't, so nothing to fix here; but the insufficient comment ;-) >> >>> It seems to me that we are somehow expecting irq->active and irq->vcpu >>> to be in sync, but that's not necessarily the case if the IRQ is not >>> locked. >>> >>>> + >>>> + /* If enabled and pending, it can migrate to a new one */ >>> >>> I think this comment should be rewritten to: >>> >>> If the IRQ is not active but enabled and pending, we should direct it to >>> its configured target VCPU. >>> >>>> + if (irq->enabled && irq->pending) >>>> + return irq->target_vcpu; >>>> + >>>> + /* Otherwise, it is considered idle */ >>> >>> not sure what idle means here, I suggest something like: >>> >>> If neither active nor pending and enabled, then this IRQ should not be >>> queued to any VCPU. >>> >>>> + return NULL; >>>> +} >>>> + >>>> +/* >>>> + * Only valid injection if changing level for level-triggered IRQs or for a >>>> + * rising edge. >>>> + */ >>>> +static bool vgic_validate_injection(struct vgic_irq *irq, bool level) >>>> +{ >>>> + switch (irq->config) { >>>> + case VGIC_CONFIG_LEVEL: >>>> + return irq->line_level != level; >>>> + case VGIC_CONFIG_EDGE: >>>> + return level; >>>> + default: >>>> + BUG(); >>> >>> is the default case there for making the compiler happy or can we just >>> get rid of it? >> >> Just removing it was fine (for GCC 5.3.0, at least). >> >>>> + } >>>> +} >>>> + >>>> +/* >>>> + * Check whether an IRQ needs to (and can) be queued to a VCPU's ap list. >>>> + * Do the queuing if necessary, taking the right locks in the right order. >>>> + * Returns true when the IRQ was queued, false otherwise. >>>> + * >>>> + * Needs to be entered with the IRQ lock already held, but will return >>>> + * with all locks dropped. >>>> + */ >>>> +bool vgic_queue_irq(struct kvm *kvm, struct vgic_irq *irq) >>> >>> should we name this vgic_try_queue_irq_locked ? >> >> Mmh, since it (re-)tries quite hard I am not sure _try_ would be >> misleading. Basically it queues the IRQ whenever possible and/or >> sensible. Having _unlock in it like you suggested in another reply makes >> more sense, I think. > > agreed > >> >>>> +{ >>>> + struct kvm_vcpu *vcpu = vgic_target_oracle(irq); >>> >>> should we have something like BUG_ON(!spin_is_locked(irq->irq_lock)); >>> here? >>> >>> Not sure if there's some bug checking here which is only emitted if a >>> user select CONFIG_CHECK_SOME_LOCKING_THINGS that we could use...? >> >> There is CONFIG_DEBUG_SPINLOCK, but I couldn't find some conditional >> debug macro suitable for the purpose. I defined one now for the file >> only (since we have quite some users here). >> >>>> + >>>> + if (irq->vcpu || !(irq->pending && irq->enabled) || !vcpu) { >>>> + /* >>>> + * If this IRQ is already on a VCPU's ap_list, then it >>>> + * cannot be moved or modified and there is no more work for >>>> + * us to do. >>>> + * >>>> + * Otherwise, if the irq is not pending and enabled, it does >>>> + * not need to be inserted into an ap_list and there is also >>>> + * no more work for us to do. >>>> + */ >>> >>> is the !vcpu check here not redundant because if you ever get to >>> evaluating it, then irq->vcpu is null, and pending and enabled are set, >>> which means the oracle couldn't have returned null, could it? >> >> In this case vcpu is always irq->target_vcpu, if I did the math >> correctly. So can this be NULL? >> Even if this is correct reasoning, I wonder if we optimize something >> prematurely here and rely on the current implementation of >> vgic_target_oracle(). I think the check for "!vcpu" is here to avoid a >> NULL pointer deference below (in the first spin_lock after the retry: >> label), so I'd rather keep this explicit check in here. > > I'm really not a fan of building the correctness of one of the most > crucial parts of our code based on "let's add a few extra checks which > may not be necessary, just in case" kind of logic. > > So let's be clear on why we have an if-statement here exactly: > > As the comment says, if we can't move the IRQ, because it's already > assigned to somebody or if this IRQ is not pending or active, then it's > shouldn't be queued. > > So the simple and all-encompassing check here is simply: > > if (irq->vcpu || !vcpu) { > spin_unlock(&irq->irq_lock); > return false; > } > > The only requirement for this to be correct is that the MMIO handler for > ISACTIVER to both set the active bit and the irq->vcpu pointer (and put > it on the AP list), without calling this function...). That was my > quesiton below. > > Because if that's not the case, you could end up here with irq->active > set, but irq->vcpu == NULL and !(pending && enabled) and you'd error > out, which means you would have to check explicitly for the active state > here as well, but I think that just becomes too messy. > > So, just change this to what I propose and we can deal with the active > state MMIO handler separately. I agree that setting the active state via MMIO is a mess in general and stuffing this case into this function here gets hairy. I am tempted to not support it in the first version, I guess it never really worked reliably before ... At the moment I am trying to code this explicitly into the SACTIVER handler and it's messy, too (because of the corner cases). Let's see how this will look like ... >> >>> that would also explain why we don't have to re-check the same >>> conditions below... >>> >>> or am I getting this wrong, because you could also have someone >>> explicitly setting the IRQ to active via trapped MMIO, in which case we >>> should be able to queue it without it being pending && enabled, which >>> would indicate that it's the other way around, you should only evaluate >>> !vcpu and kup the !(pending && enabled) part....? >> >> You lost me here, which hints at the fragility of this optimization ;-) >> >>>> + spin_unlock(&irq->irq_lock); >>>> + return false; >>>> + } >>>> + >>>> + /* >>>> + * We must unlock the irq lock to take the ap_list_lock where >>>> + * we are going to insert this new pending interrupt. >>>> + */ >>>> + spin_unlock(&irq->irq_lock); >>>> + >>>> + /* someone can do stuff here, which we re-check below */ >>>> +retry: >>>> + spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock); >>>> + spin_lock(&irq->irq_lock); >>>> + >>>> + /* >>>> + * Did something change behind our backs? >>>> + * >>>> + * There are two cases: >>>> + * 1) The irq became pending or active behind our backs and/or >>>> + * the irq->vcpu field was set correspondingly when putting >>>> + * the irq on an ap_list. Then drop the locks and return. >>>> + * 2) Someone changed the affinity on this irq behind our >>>> + * backs and we are now holding the wrong ap_list_lock. >>>> + * Then drop the locks and try the new VCPU. >>>> + */ >>>> + if (irq->vcpu || !(irq->pending && irq->enabled)) { >>> >>> here I'm concerned about the active state again. >> >> Mmmh, can you elaborate and sketch a case where the active state would >> cause trouble? This check is just here to avoid iterating on a no longer >> pending or enabled IRQ. I wonder if an active IRQ can really sneak into >> this function here in the first place? > > After having gone through the series I think we should deal with > the active state queing directly in the vgic_mmio_write_sactive() > function. > > But I still prefer to move the retry label to the very top of this > function, and simplify these two statemtns to the condition I suggested: > > if (unlinkely(irq->vcpu || vcpu != vgic_target_oracle(irq))) > goto retry; > > The cost is that we perform a few additional checks at runtime in the > case where the IRQ was migrated while we released a lock (rare), but I > think it simplifies the code. OK, I made this change. Also the shorter check after asking the oracle above. This should also better work in the case where target_vcpu is NULL (because either no bit in TARGETSR is set or a non-existent MPIDR has been written into IROUTER). Cheers, Andre. >> >>> I feel like something more similar to my initial version of this patch >>> is what we really want: >>> >>> if (irq->vcpu || vcpu != vgic_target_oracle(irq)) >>> goto real_retry; >>> >>> and read_retry is then a label at the very top of this function, before >>> the initial call to vgic_target_oracle().... >>> >>>> + spin_unlock(&irq->irq_lock); >>>> + spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock); >>>> + return false; >>>> + } >>>> + >>>> + if (irq->target_vcpu != vcpu) { >>>> + spin_unlock(&irq->irq_lock); >>>> + spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock); >>>> + >>>> + vcpu = irq->target_vcpu; >>>> + goto retry; >>>> + } >>>> + >>>> + list_add_tail(&irq->ap_list, &vcpu->arch.vgic_cpu.ap_list_head); >>>> + irq->vcpu = vcpu; >>>> + >>>> + spin_unlock(&irq->irq_lock); >>>> + spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock); >>>> + >>>> + kvm_vcpu_kick(vcpu); >>>> + >>>> + return true; >>>> +} >>>> + >>>> +static void vgic_update_irq_pending(struct kvm *kvm, struct kvm_vcpu *vcpu, >>>> + u32 intid, bool level) >>>> +{ >>>> + struct vgic_irq *irq = vgic_get_irq(kvm, vcpu, intid); >>>> + >>>> + trace_vgic_update_irq_pending(vcpu->vcpu_id, intid, level); >>>> + >>>> + BUG_ON(in_interrupt()); >>> >>> I don't remember why we thought it was a good idea to have this BUG_ON() >>> anymore. Anyone? >> >> Me neither. Is that because of the case where "kvm_notify_acked_irq >> calls kvm_set_irq" (which in turn may call this function)? >> I am happy to remove it, also as the old VGIC doesn't seem to have it. > > ok, nuke it. > >> >>>> + >>>> + spin_lock(&irq->irq_lock); >>>> + >>>> + if (!vgic_validate_injection(irq, level)) { >>>> + /* Nothing to see here, move along... */ >>>> + spin_unlock(&irq->irq_lock); >>>> + return; >>>> + } >>>> + >>>> + if (irq->config == VGIC_CONFIG_LEVEL) { >>>> + irq->line_level = level; >>>> + irq->pending = level || irq->soft_pending; >>>> + } else { >>>> + irq->pending = true; >>>> + } >>>> + >>>> + vgic_queue_irq(kvm, irq); >>>> +} >>>> + >>>> +/** >>>> + * kvm_vgic_inject_irq - Inject an IRQ from a device to the vgic >>>> + * @kvm: The VM structure pointer >>>> + * @cpuid: The CPU for PPIs >>>> + * @intid: The INTID to inject a new state to. >>>> + * must not be mapped to a HW interrupt. >>> >>> stray line here? I don't understand this bit about 'must not be mapped' >>> and I think that should be moved to the explanation below with some >>> rationale, and if important, perhaps guarded with a BUG_ON() ? >> >> I think this is a copy&paste leftover from the old VGIC with the old way >> of handling mapped IRQs. Actually the implementations of >> kvm_vgic_inject_irq() and kvm_vgic_inject_mapped_irq() are now >> identical, so the former differentiation does not apply anymore. I will >> #define the latter to the former for the new VGIC and we should schedule >> the removal of the the "mapped" version when the old VGIC gets removed. > > sounds good. > >> >> Btw: Are we OK with marking those cases which deserve some rework after >> the old VGIC is gone with some kind of TODO comments? >> > > I really think we should avoid merging TODOs as much as possible, but in > this case it's an exported interface function which could be hard to > work around with the current vgic, so it may be an exception to the > rule. > > Thanks, > -Christoffer > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html