Avi Kivity wrote: > Gregory Haskins wrote: >> Avi Kivity wrote: >> >>> Gregory Haskins wrote: >>> >>> >>>> +int >>>> +kvm_irqfd(struct kvm *kvm, int gsi, int flags) >>>> +{ >>>> + struct _irqfd *irqfd; >>>> + struct file *file = NULL; >>>> + int fd = -1; >>>> + int ret; >>>> + >>>> + irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL); >>>> + if (!irqfd) >>>> + return -ENOMEM; >>>> + >>>> + irqfd->kvm = kvm; >>>> >>> You need to increase the refcount on struct kvm here. Otherwise evil >>> userspace will create an irqfd, close the vm and vcpu fds, and inject >>> an interrupt. >>> >> >> I just reviewed the code in prep for v5, and now I remember why I didnt >> take a reference: I designed it the opposite direction: the vm-fd owns >> a reference to the irqfd, and will decouple the kvm context from the >> eventfd on shutdown (see kvm_irqfd_release()). I still need to spin a >> v5 regardless in order to add the padding as previously discussed. But >> let me know if you still see any holes in light of this alternate object >> lifetime approach I am using. >> > > Right, irqfd_release works. But I think refcounting is simpler, since > we already kvm_get_kvm() and kvm_put_kvm(), and you wouldn't need the > irqfd list. On the other hand, I'm not sure you get a callback from > eventfd on close(), so refcounting may not be implementable. ;) > > Drat, irqfd_release doesn't work. You reference kvm->lock in > irqfd_inject without taking any locks. I *think* this is ok, tho. I remove myself from the waitq, and then flush any potentially scheduled deferred work before returning. This all happens synchronously to the vm_release() code when the vm-fd is bring dropped, but before we actually release the struct kvm*. Therefore, I think kvm->lock is guaranteed to remain valid for the duration of the irqfd_release(), and we guarantee it wont be accessed after the irqfd_release() completes. Or do you have a different concern? On this topic of proper ref counts, though.... I wonder if I need an extra fget() in there. I presume that the evenfd_file_create() returns with only a single reference, which presumably I am handing one to userspace, and one to the irqfd.... which is broken. Or does fd_install() bump that for me (doesnt look like it)? Al, Davide, any comments? > > btw, there's still your original idea of creating the eventfd in > userspace and passing it down. That would be workable if we can see a > way to both signal the eventfd and get called back in irq context. > Maybe that's preferable to what we're doing here, but we need to see > how it would work. We can do that, but I don't see it as changing the general problem here. However, I think if you find that the above comments about the kvm->lock w.r.t. irqfd_release() are ok, we don't need to worry about it. If you prefer the userspace allocation of eventfd() for other reasons, we can easily go back to that model as well...but its not strictly necessary for this particular issue iiuc. -Greg
Attachment:
signature.asc
Description: OpenPGP digital signature