Re: Possible race condition during lock_count increment in srcu_read_lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 29, 2017 at 05:05:47PM +0200, Paolo Bonzini wrote:
> Adding Paul McKenney...
> 
> On 29/05/2017 16:30, Linu Cherian wrote:
> > In KVM irq injection path, irqfd_wakeup in interrupt context
> > calls srcu_read_lock(&kvm->irq_srcu). A worker thread, irqfd_inject
> > in process context also calls srcu_read_lock inside kvm_set_irq. 
> > 
> > This can lead to race condition while incrementing the
> > lock_count(using __this_cpu_inc), since atomic instructions being not
> > used. 
> > 
> > Replacing __this_cpu_inc with this_cpu_inc(atomic version) workarounds the
> > problem which backs up this analysis.
> 
> Paul, how would you feel about either using this_cpu_inc in
> __srcu_read_lock and __srcu_read_unlock?  Many architectures already
> provide efficient this_cpu_inc, and any architecture that has LL/SC
> could provide it easily.

You actually are not supposed to call __srcu_read_lock() and
__srcu_read_unlock() from irq context.  (Unless you know that you
interrupted some context that is guaranteed never to use the srcu_struct
in question or some such.)

Given that this is the KVM irq injection path, I am guessing that it
is executed quite frequently, so that you do need a low-cost highly
scalable solution.  However, you only really need to exclude between
process level and the various interrupts, so there are three ways
that an architecture could handle this:

1.	Single memory-to-memory increment instructions, as you say.

2.	Atomic read-modify-write instructions, again as you say.

3.	Disable interrupts across __srcu_read_lock() and __srcu_read_unlock().

Different architectures would want different approached, depending on what
instructions are available and the relative speed of these instructions.
It would not be hard to make this an architecture choice on the one
hand or to provide an srcu_read_lock_irq() and srcu_read_unlock_irq()
on the other.  But I recently got some serious pushback on RCU API
proliferation on the one hand and SRCU in particular on the other.  So
it might be good to look at some alternatives as well.

But first, one question...  Does the entire interrupt run in the guest in
interrupt context in the host?  Or is there some sort of wakeup sequence
where the guest's interrupt handler runs in process context from the
host's perspective?  From a quick glance at the code, it looks like
the guest's interrupt handler runs in process context (from the host's
perspective) and that the SRCU read-side critical section is confined
to the host's interrupt handler.

Another (stupid) question...  Can the process-level uses of ->irq_srcu
instead use ->srcu?  As the code is written now, it is OK to use
srcu_read_lock() and srcu_read_unlock() from interrupt handlers as
long as they are never used from either process level or from nested
interrupt handlers.  Does that work for you?

If this is the case, another approach would be for updates to wait for
both an SRCU grace period and an RCU-sched grace period, perhaps with
the latter expedited, though that won't make the real-time guys happy.

							Thanx, Paul

> Thanks,
> 
> Paolo
> 
> > Possible solution
> > -----------------
> > One way is to avoid the srcu_read_lock/unlock usage in irq context.
> > In arm/arm64 architecture, 	
> > if (kvm_arch_set_irq_inatomic(&irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1,
> >       false) == -EWOULDBLOCK)
> > in irqfd_wakefup is always true and hence schedule_work can be called
> > directly
> > 
> > @@ -195,6 +195,11 @@ int __attribute__((weak))
> > kvm_arch_set_irq_inatomic(
> >         int idx;
> >  
> >         if (flags & POLLIN) {
> > +               if (kvm_arch_set_irq_will_block_always()) {
> > +                       schedule_work(&irqfd->inject);
> > +                       goto skiplock;
> > +               }
> > +
> >                 idx = srcu_read_lock(&kvm->irq_srcu);
> >                 do {
> >                         seq =
> > 			read_seqcount_begin(&irqfd->irq_entry_sc);
> > @@ -208,6 +213,7 @@ int __attribute__((weak))
> > kvm_arch_set_irq_inatomic(
> >                 srcu_read_unlock(&kvm->irq_srcu, idx);
> >         }
> >  
> > +skiplock:
> >         if (flags & POLLHUP) {
> >                 /* The eventfd is closing, detach from KVM */
> >                 unsigned long flags;
> > 
> > This works without giving any warnings as well.
> > 
> > Is a patch welcome in that direction ?
> > 
> > Appreicate your feedback on this.
> > 
> > 
> > 
> 




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux