Re: [PATCH 4/4] Convert irq notifiers lists to RCU locking.

Gleb Natapov <gleb@xxxxxxxxxx> · Mon, 13 Jul 2009 17:08:49 +0300



On Mon, Jul 13, 2009 at 10:02:13AM -0400, Gregory Haskins wrote:
> Gleb Natapov wrote:
> > On Mon, Jul 13, 2009 at 09:40:01AM -0400, Gregory Haskins wrote:
> >   
> >> Gleb Natapov wrote:
> >>     
> >>> On Mon, Jul 13, 2009 at 09:26:21AM -0400, Gregory Haskins wrote:
> >>>   
> >>>       
> >>>> Gleb Natapov wrote:
> >>>>     
> >>>>         
> >>>>> On Mon, Jul 13, 2009 at 04:02:56PM +0300, Michael S. Tsirkin wrote:
> >>>>>   
> >>>>>       
> >>>>>           
> >>>>>> On Sun, Jul 12, 2009 at 03:03:53PM +0300, Gleb Natapov wrote:
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>> Use RCU locking for mask/ack notifiers lists.
> >>>>>>>
> >>>>>>> Signed-off-by: Gleb Natapov <gleb@xxxxxxxxxx>
> >>>>>>> ---
> >>>>>>>  virt/kvm/irq_comm.c |   20 +++++++++++---------
> >>>>>>>  1 files changed, 11 insertions(+), 9 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> >>>>>>> index 5dde1ef..ba3a115 100644
> >>>>>>> --- a/virt/kvm/irq_comm.c
> >>>>>>> +++ b/virt/kvm/irq_comm.c
> >>>>>>> @@ -179,18 +179,18 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
> >>>>>>>  			break;
> >>>>>>>  		}
> >>>>>>>  	}
> >>>>>>> -	rcu_read_unlock();
> >>>>>>>  
> >>>>>>> -	hlist_for_each_entry(kian, n, &kvm->irq_ack_notifier_list, link)
> >>>>>>> +	hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list, link)
> >>>>>>>  		if (kian->gsi == gsi)
> >>>>>>>  			kian->irq_acked(kian);
> >>>>>>> +	rcu_read_unlock();
> >>>>>>>  }
> >>>>>>>  
> >>>>>>>  void kvm_register_irq_ack_notifier(struct kvm *kvm,
> >>>>>>>  				   struct kvm_irq_ack_notifier *kian)
> >>>>>>>  {
> >>>>>>>  	mutex_lock(&kvm->irq_lock);
> >>>>>>> -	hlist_add_head(&kian->link, &kvm->irq_ack_notifier_list);
> >>>>>>> +	hlist_add_head_rcu(&kian->link, &kvm->irq_ack_notifier_list);
> >>>>>>>  	mutex_unlock(&kvm->irq_lock);
> >>>>>>>  }
> >>>>>>>  
> >>>>>>> @@ -198,8 +198,9 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
> >>>>>>>  				    struct kvm_irq_ack_notifier *kian)
> >>>>>>>  {
> >>>>>>>  	mutex_lock(&kvm->irq_lock);
> >>>>>>> -	hlist_del_init(&kian->link);
> >>>>>>> +	hlist_del_init_rcu(&kian->link);
> >>>>>>>  	mutex_unlock(&kvm->irq_lock);
> >>>>>>> +	synchronize_rcu();
> >>>>>>>  }
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>> This is done under kvm->lock still, which means the lock might be held
> >>>>>> potentially for a very long time. Can synchronize_rcu be moved out of
> >>>>>> this lock?
> >>>>>>
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>> Only if kvm_free_assigned_device() will be moved out of this lock.
> >>>>> Device de-assignment is not very frequent event though. How long do you
> >>>>> think it may be held? KVM RCU read sections are very brief.
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> Note that the delay imposed by the barrier is not only related to the
> >>>> length of the critical section.  The barrier blocks until the next grace
> >>>> period, and depending on the type of RCU you are using and your config
> >>>> options, this could be multiple milliseconds.
> >>>>
> >>>> I am not saying that this is definitely a problem for your design.   I
> >>>> am just pointing out that the length of the KVM-RCU read section is only
> >>>>     
> >>>>         
> >>> Yeah I understand that other RCU read section may introduce delays too.
> >>> The question is how big the delay may be.
> >>>       
> >> I think you are misunderstanding me.  The read-side CS is not a
> >> significant factor here so I am not worried about concurrent read-side
> >> CS causing a longer delay.  What I am saying is that the grace period of
> >> your RCU subsystem is the dominant factor in the equation here, and this
> >> may be several milliseconds.
> >>
> >>     
> > How is the "grace period" is determined? Isn't it just means "no cpus is
> > in RCU read section anymore"?
> >   
> 
> Nope ;)
> 
Now I recall something about each CPU passing scheduler. Thanks.

> RCU is pretty complex, so I won't even try to explain it here as there
> are numerous articles floating around out there that do a much better job.
> 
> But here is a summary:  RCU buys you two things: 1) concurrent readers
> *and* writers, and 2) a much lower overhead reader path because it
> generally doesn't use atomic.  Its point (2) that is relevant here.
> 
> If taking an atomic were ok, you could approximate the RCU model using
> reference counting.  Reference counting buys you "precise" resource
> acquistion/release at the expense of the overhead of the atomic
> operation (and any associated cache-line bouncing).  RCU uses a
> "imprecise" model where we don't really know the *exact* moment the
> resource is released.  Instead, there are specific boundaries in time
> when we can guarantee that it had to have been released prior to the
> expiry of the event.  This event is what is called the "grace period".
> 
> So that is what synchronize_rcu() is doing.  Its a barrier to the next
> imprecise moment in time when we can be assured (if you used the rest of
> the RCU API properly) that there can not be any outstanding references
> to your object left in flight.  Each grace period can be milliseconds,
> depending on what version of the kernel you have and how it is configured.
> 
> HTH
> 
> Kind Regards,
> -Greg
> 
> >   
> >>>  I don't think multiple
> >>> milliseconds delay in device de-assignment is a big issue though.
> >>>   
> >>>       
> >> I would tend to agree with you.  It's not fast path.
> >>
> >> I only brought this up because I saw your design being justified
> >> incorrectly:  you said "KVM RCU read sections are very brief", but that
> >> is not really relevant to Michael's point.  I just want to make sure
> >> that the true impact is understood.
> >>
> >> Kind Regards,
> >> -Greg
> >>
> >>
> >>     
> >
> >
> >
> > --
> > 			Gleb.
> >   
> 
> 


--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html