On Mon, Jul 13, 2009 at 10:02:13AM -0400, Gregory Haskins wrote: > Gleb Natapov wrote: > > On Mon, Jul 13, 2009 at 09:40:01AM -0400, Gregory Haskins wrote: > > > >> Gleb Natapov wrote: > >> > >>> On Mon, Jul 13, 2009 at 09:26:21AM -0400, Gregory Haskins wrote: > >>> > >>> > >>>> Gleb Natapov wrote: > >>>> > >>>> > >>>>> On Mon, Jul 13, 2009 at 04:02:56PM +0300, Michael S. Tsirkin wrote: > >>>>> > >>>>> > >>>>> > >>>>>> On Sun, Jul 12, 2009 at 03:03:53PM +0300, Gleb Natapov wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Use RCU locking for mask/ack notifiers lists. > >>>>>>> > >>>>>>> Signed-off-by: Gleb Natapov <gleb@xxxxxxxxxx> > >>>>>>> --- > >>>>>>> virt/kvm/irq_comm.c | 20 +++++++++++--------- > >>>>>>> 1 files changed, 11 insertions(+), 9 deletions(-) > >>>>>>> > >>>>>>> diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c > >>>>>>> index 5dde1ef..ba3a115 100644 > >>>>>>> --- a/virt/kvm/irq_comm.c > >>>>>>> +++ b/virt/kvm/irq_comm.c > >>>>>>> @@ -179,18 +179,18 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin) > >>>>>>> break; > >>>>>>> } > >>>>>>> } > >>>>>>> - rcu_read_unlock(); > >>>>>>> > >>>>>>> - hlist_for_each_entry(kian, n, &kvm->irq_ack_notifier_list, link) > >>>>>>> + hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list, link) > >>>>>>> if (kian->gsi == gsi) > >>>>>>> kian->irq_acked(kian); > >>>>>>> + rcu_read_unlock(); > >>>>>>> } > >>>>>>> > >>>>>>> void kvm_register_irq_ack_notifier(struct kvm *kvm, > >>>>>>> struct kvm_irq_ack_notifier *kian) > >>>>>>> { > >>>>>>> mutex_lock(&kvm->irq_lock); > >>>>>>> - hlist_add_head(&kian->link, &kvm->irq_ack_notifier_list); > >>>>>>> + hlist_add_head_rcu(&kian->link, &kvm->irq_ack_notifier_list); > >>>>>>> mutex_unlock(&kvm->irq_lock); > >>>>>>> } > >>>>>>> > >>>>>>> @@ -198,8 +198,9 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm, > >>>>>>> struct kvm_irq_ack_notifier *kian) > >>>>>>> { > >>>>>>> mutex_lock(&kvm->irq_lock); > >>>>>>> - hlist_del_init(&kian->link); > >>>>>>> + hlist_del_init_rcu(&kian->link); > >>>>>>> mutex_unlock(&kvm->irq_lock); > >>>>>>> + synchronize_rcu(); > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> This is done under kvm->lock still, which means the lock might be held > >>>>>> potentially for a very long time. Can synchronize_rcu be moved out of > >>>>>> this lock? > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> Only if kvm_free_assigned_device() will be moved out of this lock. > >>>>> Device de-assignment is not very frequent event though. How long do you > >>>>> think it may be held? KVM RCU read sections are very brief. > >>>>> > >>>>> > >>>>> > >>>> Note that the delay imposed by the barrier is not only related to the > >>>> length of the critical section. The barrier blocks until the next grace > >>>> period, and depending on the type of RCU you are using and your config > >>>> options, this could be multiple milliseconds. > >>>> > >>>> I am not saying that this is definitely a problem for your design. I > >>>> am just pointing out that the length of the KVM-RCU read section is only > >>>> > >>>> > >>> Yeah I understand that other RCU read section may introduce delays too. > >>> The question is how big the delay may be. > >>> > >> I think you are misunderstanding me. The read-side CS is not a > >> significant factor here so I am not worried about concurrent read-side > >> CS causing a longer delay. What I am saying is that the grace period of > >> your RCU subsystem is the dominant factor in the equation here, and this > >> may be several milliseconds. > >> > >> > > How is the "grace period" is determined? Isn't it just means "no cpus is > > in RCU read section anymore"? > > > > Nope ;) > Now I recall something about each CPU passing scheduler. Thanks. > RCU is pretty complex, so I won't even try to explain it here as there > are numerous articles floating around out there that do a much better job. > > But here is a summary: RCU buys you two things: 1) concurrent readers > *and* writers, and 2) a much lower overhead reader path because it > generally doesn't use atomic. Its point (2) that is relevant here. > > If taking an atomic were ok, you could approximate the RCU model using > reference counting. Reference counting buys you "precise" resource > acquistion/release at the expense of the overhead of the atomic > operation (and any associated cache-line bouncing). RCU uses a > "imprecise" model where we don't really know the *exact* moment the > resource is released. Instead, there are specific boundaries in time > when we can guarantee that it had to have been released prior to the > expiry of the event. This event is what is called the "grace period". > > So that is what synchronize_rcu() is doing. Its a barrier to the next > imprecise moment in time when we can be assured (if you used the rest of > the RCU API properly) that there can not be any outstanding references > to your object left in flight. Each grace period can be milliseconds, > depending on what version of the kernel you have and how it is configured. > > HTH > > Kind Regards, > -Greg > > > > >>> I don't think multiple > >>> milliseconds delay in device de-assignment is a big issue though. > >>> > >>> > >> I would tend to agree with you. It's not fast path. > >> > >> I only brought this up because I saw your design being justified > >> incorrectly: you said "KVM RCU read sections are very brief", but that > >> is not really relevant to Michael's point. I just want to make sure > >> that the true impact is understood. > >> > >> Kind Regards, > >> -Greg > >> > >> > >> > > > > > > > > -- > > Gleb. > > > > -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html