Michael S. Tsirkin wrote: > On Tue, Oct 27, 2009 at 02:54:40PM -0400, Gregory Haskins wrote: >> Michael S. Tsirkin wrote: >>> On Mon, Oct 26, 2009 at 12:22:08PM -0400, Gregory Haskins wrote: >>>> IRQFD currently uses a deferred workqueue item to execute the injection >>>> operation. It was originally designed this way because kvm_set_irq() >>>> required the caller to hold the irq_lock mutex, and the eventfd callback >>>> is invoked from within a non-preemptible critical section. >>>> >>>> With the advent of lockless injection support for certain GSIs, the >>>> deferment mechanism is no longer technically needed in all cases. >>>> Since context switching to the workqueue is a source of interrupt >>>> latency, lets switch to a direct method whenever possible. Fortunately >>>> for us, the most common use of irqfd (MSI-based GSIs) readily support >>>> lockless injection. >>>> >>>> Signed-off-by: Gregory Haskins <ghaskins@xxxxxxxxxx> >>> This is a useful optimization I think. >>> Some comments below. >>> >>>> --- >>>> >>>> virt/kvm/eventfd.c | 31 +++++++++++++++++++++++++++---- >>>> 1 files changed, 27 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c >>>> index 30f70fd..e6cc958 100644 >>>> --- a/virt/kvm/eventfd.c >>>> +++ b/virt/kvm/eventfd.c >>>> @@ -51,20 +51,34 @@ struct _irqfd { >>>> wait_queue_t wait; >>>> struct work_struct inject; >>>> struct work_struct shutdown; >>>> + void (*execute)(struct _irqfd *); >>>> }; >>>> >>>> static struct workqueue_struct *irqfd_cleanup_wq; >>>> >>>> static void >>>> -irqfd_inject(struct work_struct *work) >>>> +irqfd_inject(struct _irqfd *irqfd) >>>> { >>>> - struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); >>>> struct kvm *kvm = irqfd->kvm; >>>> >>>> kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); >>>> kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); >>>> } >>>> >>>> +static void >>>> +irqfd_deferred_inject(struct work_struct *work) >>>> +{ >>>> + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); >>>> + >>>> + irqfd_inject(irqfd); >>>> +} >>>> + >>>> +static void >>>> +irqfd_schedule(struct _irqfd *irqfd) >>>> +{ >>>> + schedule_work(&irqfd->inject); >>>> +} >>>> + >>>> /* >>>> * Race-free decouple logic (ordering is critical) >>>> */ >>>> @@ -126,7 +140,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) >>>> >>>> if (flags & POLLIN) >>>> /* An event has been signaled, inject an interrupt */ >>>> - schedule_work(&irqfd->inject); >>>> + irqfd->execute(irqfd); >>>> >>>> if (flags & POLLHUP) { >>>> /* The eventfd is closing, detach from KVM */ >>>> @@ -179,7 +193,7 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) >>>> irqfd->kvm = kvm; >>>> irqfd->gsi = gsi; >>>> INIT_LIST_HEAD(&irqfd->list); >>>> - INIT_WORK(&irqfd->inject, irqfd_inject); >>>> + INIT_WORK(&irqfd->inject, irqfd_deferred_inject); >>>> INIT_WORK(&irqfd->shutdown, irqfd_shutdown); >>>> >>>> file = eventfd_fget(fd); >>>> @@ -209,6 +223,15 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) >>>> list_add_tail(&irqfd->list, &kvm->irqfds.items); >>>> spin_unlock_irq(&kvm->irqfds.lock); >>>> >>>> + ret = kvm_irq_check_lockless(kvm, gsi); >>>> + if (ret < 0) >>>> + goto fail; >>>> + >>>> + if (ret) >>>> + irqfd->execute = &irqfd_inject; >>>> + else >>>> + irqfd->execute = &irqfd_schedule; >>>> + >>> Can't gsi get converted from lockless to non-lockless >>> after it's checked (by the routing ioctl)? >> I think I protect against this in patch 2/3 by ensuring that any vectors >> that are added have to conform to the same locking rules. The code >> doesn't support deleting routes, so we really only need to make sure >> that new routes do not change. > > What I refer to, is when userspace calls KVM_SET_GSI_ROUTING. > I don't see how your patch helps here: can't a GSI formerly > used for MSI become unused, and then reused for non-MSI? > If not, it's a problem I think, because I think userspace currently does this > sometimes. I see your point. I was thinking vectors could only be added, not deleted, but I see upon further inspection that is not the case. > >>> Kernel will crash then. >>> >>> How about, each time we get event from eventfd, we implement >>> kvm_irqfd_toggle_lockless, which does a single scan, and returns >>> true/false status (and I really mean toggle, let's not do set 1 / set 0 >>> as well) telling us whether interrupts could be delivered in a lockless >>> manner? >> I am not sure I like this idea in general given that I believe I already >> handle the error case you are concerned with. >> >> However, the concept of providing a "toggle" option so we can avoid >> scanning the list twice is a good one. That can be done as a new patch >> series, but it would be a nice addition. >> >> Thanks Michael, >> -Greg >> > >
Attachment:
signature.asc
Description: OpenPGP digital signature