On Thu, Apr 25, 2013 at 11:13:40PM +0200, Alexander Graf wrote: > > On 25.04.2013, at 21:03, Scott Wood wrote: > > > On 04/25/2013 09:49:23 AM, Alexander Graf wrote: > >> On 25.04.2013, at 13:30, Alexander Graf wrote: > >> > > >> > On 19.04.2013, at 20:51, Scott Wood wrote: > >> > > >> >> On 04/19/2013 09:06:27 AM, Alexander Graf wrote: > >> >>> Now that all pieces are in place for reusing generic irq infrastructure, > >> >>> we can copy x86's implementation of KVM_IRQ_LINE irq injection and simply > >> >>> reuse it for PPC, as it will work there just as well. > >> >>> Signed-off-by: Alexander Graf <agraf@xxxxxxx> > >> >>> --- > >> >>> arch/powerpc/include/uapi/asm/kvm.h | 1 + > >> >>> arch/powerpc/kvm/powerpc.c | 13 +++++++++++++ > >> >>> 2 files changed, 14 insertions(+), 0 deletions(-) > >> >>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h > >> >>> index 3537bf3..dbb2ac2 100644 > >> >>> --- a/arch/powerpc/include/uapi/asm/kvm.h > >> >>> +++ b/arch/powerpc/include/uapi/asm/kvm.h > >> >>> @@ -26,6 +26,7 @@ > >> >>> #define __KVM_HAVE_SPAPR_TCE > >> >>> #define __KVM_HAVE_PPC_SMT > >> >>> #define __KVM_HAVE_IRQCHIP > >> >>> +#define __KVM_HAVE_IRQ_LINE > >> >>> struct kvm_regs { > >> >>> __u64 pc; > >> >>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > >> >>> index c431fea..874c106 100644 > >> >>> --- a/arch/powerpc/kvm/powerpc.c > >> >>> +++ b/arch/powerpc/kvm/powerpc.c > >> >>> @@ -33,6 +33,7 @@ > >> >>> #include <asm/cputhreads.h> > >> >>> #include <asm/irqflags.h> > >> >>> #include "timing.h" > >> >>> +#include "irq.h" > >> >>> #include "../mm/mmu_decl.h" > >> >>> #define CREATE_TRACE_POINTS > >> >>> @@ -945,6 +946,18 @@ static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo *pvinfo) > >> >>> return 0; > >> >>> } > >> >>> +int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event, > >> >>> + bool line_status) > >> >>> +{ > >> >>> + if (!irqchip_in_kernel(kvm)) > >> >>> + return -ENXIO; > >> >>> + > >> >>> + irq_event->status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, > >> >>> + irq_event->irq, irq_event->level, > >> >>> + line_status); > >> >>> + return 0; > >> >>> +} > >> >> > >> >> As Paul noted in the XICS patchset, this could reference an MPIC that has gone away if the user never attached any vcpus and then closed the MPIC fd. It's not a reasonable use case, but it could be used malicously to get the kernel to access a bad pointer. The irqchip_in_kernel check helps somewhat, but it's meant for ensuring that the creation has happened -- it's racy if used for ensuring that destruction hasn't happened. > >> >> > >> >> The problem is rooted in the awkwardness of performing an operation that logically should be on the MPIC fd, but is instead being done on the vm fd. > >> >> > >> >> I think these three steps would fix it (the first two seem like things we should be doing anyway): > >> >> - During MPIC destruction, make sure MPIC deregisters all routes that reference it. > >> >> - In kvm_set_irq(), do not release the RCU read lock until after the set() function has been called. > >> >> - Do not hook up kvm_send_userspace_msi() to MPIC or other new irqchips, as that bypasses the RCU lock. It could be supported as a device fd ioctl if desired, or it could be reworked to operate on an RCU-managed list of MSI handlers, though MPIC really doesn't need this at all. > >> > > >> > Can't we just add an RCU lock in the send_userspace_msi case? I don't think we should handle MSIs any differently from normal IRQs. > > > > Well, you can't *just* add the RCU lock -- you need to add data to be managed via RCU (e.g. a list of MSI callbacks, or at least a boolean indicating whether calling the MSI code is OK). > > Well, we'd just access a random pin routing :). > > > > >> In fact I'm having a hard time verifying that we're always accessing things with proper locks held. I'm pretty sure we're missing a few cases. > > > > Any path in particular? > > I'm already getting confused on whether normal MMIO accesses are always safe. asserts via mutex_is_locked() and spinlock/rcu variants might be helpful. > >> So how about we delay mpic destruction to vm destruction? We simply add one user too many when we spawn the mpic and put it on vm_destruct. That way users _can_ destroy mpics, but they will only be really free'd once the vm is also gone. > > > > That's what we originally had before the fd conversion. If we want it again, we'll need to go back to maintaining a list of devices in KVM (though it could be a linked list now that we don't need to use it for lookups), or have some hardcoded MPIC hack. > > Well, we could have an anonymous linked list of device pointers with a simple registration function. That way it's generic enough for any device to be kept alive until vm destruction if it wants that. > > > IIRC I said back then that converting to fd would make destruction ordering more of a pain... > > I usually like to pick the raisins from everything I can. So while I like the fd approach for its universally understandable scheme, simplicity of use, extensibility of ioctls etc, I don't really like the headaches that come with destroying a device while a VM is running. So having a device keep itself alive until the VM is gone is the best of all worlds :). The other problem which arises from the moment you allow "get/set device attribute at any time during VM lifetime" (which this interface allows), is that synchronization with vcpus must be performed (and you don't want to take a lock on the vcpu path). So the programmer has to avoid doing that now. But its no big deal. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html