Re: Possible race condition during lock_count increment in srcu_read_lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 29, 2017 at 06:38:19PM +0200, Paolo Bonzini wrote:
> On 29/05/2017 17:50, Paul E. McKenney wrote:
> > On Mon, May 29, 2017 at 05:05:47PM +0200, Paolo Bonzini wrote:
> >> Adding Paul McKenney...
> >>
> >> On 29/05/2017 16:30, Linu Cherian wrote:
> >>> In KVM irq injection path, irqfd_wakeup in interrupt context
> >>> calls srcu_read_lock(&kvm->irq_srcu). A worker thread, irqfd_inject
> >>> in process context also calls srcu_read_lock inside kvm_set_irq. 
> >>>
> >>> This can lead to race condition while incrementing the
> >>> lock_count(using __this_cpu_inc), since atomic instructions being not
> >>> used. 
> >>>
> >>> Replacing __this_cpu_inc with this_cpu_inc(atomic version) workarounds the
> >>> problem which backs up this analysis.
> >>
> >> Paul, how would you feel about either using this_cpu_inc in
> >> __srcu_read_lock and __srcu_read_unlock?  Many architectures already
> >> provide efficient this_cpu_inc, and any architecture that has LL/SC
> >> could provide it easily.
> > 
> > You actually are not supposed to call __srcu_read_lock() and
> > __srcu_read_unlock() from irq context.  (Unless you know that you
> > interrupted some context that is guaranteed never to use the srcu_struct
> > in question or some such.)
> > 
> > Given that this is the KVM irq injection path, I am guessing that it
> > is executed quite frequently, so that you do need a low-cost highly
> > scalable solution.  However, you only really need to exclude between
> > process level and the various interrupts, so there are three ways
> > that an architecture could handle this:
> > 
> > 1.	Single memory-to-memory increment instructions, as you say.
> > 
> > 2.	Atomic read-modify-write instructions, again as you say.
> > 
> > 3.	Disable interrupts across __srcu_read_lock() and __srcu_read_unlock().
> > 
> > Different architectures would want different approached, depending on what
> > instructions are available and the relative speed of these instructions.
> > It would not be hard to make this an architecture choice on the one
> > hand or to provide an srcu_read_lock_irq() and srcu_read_unlock_irq()
> > on the other.  But I recently got some serious pushback on RCU API
> > proliferation on the one hand and SRCU in particular on the other.  So
> > it might be good to look at some alternatives as well.
> 
> Yeah, I liked Linu's suggestion of this_cpu_inc because it's already
> used in srcu_read_unlock, and because it can fall back to (3)
> automatically.  Relaxed atomics weren't much of a proposal, rather a way
> to implement this_cpu_inc on some architectures that do not support it
> yet (notably 32-bit ARM, MIPS and PPC).
> 
> The disadvantage would be slowing down SRCU on machines where neither
> (1) nor (2) are possible, and local_irq_save/restore is slower than
> disabling preemption.  But I can certainly wrap ->irq_srcu lock/unlock
> into my own helpers that take care of saving/restoring the interrupt
> state, like
> 
>     #define kvm_irq_srcu_read_lock_irqsave(kvm, flags)
>     #define kvm_irq_srcu_read_unlock_irqrestore(kvm, flags)
> 
> This would be very simple, confined into KVM and respectful of the SRCU
> API requirements.

I do like this approach!  If it turns out to be needed in (say) four
other places, then it might make sense for me to pull this into SRCU as
an additional API.

> > But first, one question...  Does the entire interrupt run in the guest in
> > interrupt context in the host?
> 
> It's interrupt context in the host.
> 
> > Or is there some sort of wakeup sequence
> > where the guest's interrupt handler runs in process context from the
> > host's perspective?  From a quick glance at the code, it looks like
> > the guest's interrupt handler runs in process context (from the host's
> > perspective) and that the SRCU read-side critical section is confined
> > to the host's interrupt handler.
> 
> Correct.
> 
> > Another (stupid) question...  Can the process-level uses of ->irq_srcu
> > instead use ->srcu?
> 
> I suppose you would synchronize_srcu on both in the write side?
> Probably not because because ->srcu has really really long read-side
> critical sections.  Basically the entire EQS of RCU is a read-side
> ->srcu critical section.
> 
> Unfortunately the write side of ->irq_srcu is also relatively
> performance-critical, and since ->irq_srcu read-side critical sections
> are very short, having a separate SRCU instance solved the issue quite
> nicely.  Originally Christian Borntraeger had proposed turning the
> relevant synchronize_rcu into synchronize_rcu_expedited, but as you said
> it's not good for real-time so here we are.

Hey, I had to ask!  ;-)

							Thanx, Paul

> Paolo
> 
> > As the code is written now, it is OK to use
> > srcu_read_lock() and srcu_read_unlock() from interrupt handlers as
> > long as they are never used from either process level or from nested
> > interrupt handlers.  Does that work for you?
> > 
> > If this is the case, another approach would be for updates to wait for
> > both an SRCU grace period and an RCU-sched grace period, perhaps with
> > the latter expedited, though that won't make the real-time guys happy.
> > 
> > 							Thanx, Paul
> > 
> >> Thanks,
> >>
> >> Paolo
> >>
> >>> Possible solution
> >>> -----------------
> >>> One way is to avoid the srcu_read_lock/unlock usage in irq context.
> >>> In arm/arm64 architecture, 	
> >>> if (kvm_arch_set_irq_inatomic(&irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1,
> >>>       false) == -EWOULDBLOCK)
> >>> in irqfd_wakefup is always true and hence schedule_work can be called
> >>> directly
> >>>
> >>> @@ -195,6 +195,11 @@ int __attribute__((weak))
> >>> kvm_arch_set_irq_inatomic(
> >>>         int idx;
> >>>  
> >>>         if (flags & POLLIN) {
> >>> +               if (kvm_arch_set_irq_will_block_always()) {
> >>> +                       schedule_work(&irqfd->inject);
> >>> +                       goto skiplock;
> >>> +               }
> >>> +
> >>>                 idx = srcu_read_lock(&kvm->irq_srcu);
> >>>                 do {
> >>>                         seq =
> >>> 			read_seqcount_begin(&irqfd->irq_entry_sc);
> >>> @@ -208,6 +213,7 @@ int __attribute__((weak))
> >>> kvm_arch_set_irq_inatomic(
> >>>                 srcu_read_unlock(&kvm->irq_srcu, idx);
> >>>         }
> >>>  
> >>> +skiplock:
> >>>         if (flags & POLLHUP) {
> >>>                 /* The eventfd is closing, detach from KVM */
> >>>                 unsigned long flags;
> >>>
> >>> This works without giving any warnings as well.
> >>>
> >>> Is a patch welcome in that direction ?
> >>>
> >>> Appreicate your feedback on this.
> >>>
> >>>
> >>>
> >>
> > 
> 




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux