On Tue, May 30, 2017 at 03:49:31PM +0200, Paolo Bonzini wrote: Just out of curiosity, why are we off-list? > On 29/05/2017 22:14, Paul E. McKenney wrote: > > On Mon, May 29, 2017 at 06:38:19PM +0200, Paolo Bonzini wrote: > >> On 29/05/2017 17:50, Paul E. McKenney wrote: > >>> You actually are not supposed to call __srcu_read_lock() and > >>> __srcu_read_unlock() from irq context. (Unless you know that you > >>> interrupted some context that is guaranteed never to use the srcu_struct > >>> in question or some such.) > > I was going to improve the SRCU doc comments, and then started digging > into srcu.c history to find out why srcu_read_unlock uses this_cpu_inc. > > It turns out that back in 2012 __srcu_read_lock did two accesses to > percpu variables, while __srcu_read_unlock did one (a decrement). Then, > commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via > this_cpu_dec()", 2012-11-29) changed __srcu_read_unlock to use > this_cpu_dec and remove the preempt_disable/preempt_enable pair. Yes. > Nowadays, __srcu_read_lock only does one access to percpu variables so > we could do the same as commit 5a41344a3d83 did: > > diff --git a/include/linux/srcu.h b/include/linux/srcu.h > index 167ad8831aaf..4c1d5f7e62c4 100644 > --- a/include/linux/srcu.h > +++ b/include/linux/srcu.h > @@ -172,9 +172,7 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp) > { > int retval; > > - preempt_disable(); > retval = __srcu_read_lock(sp); > - preempt_enable(); > rcu_lock_acquire(&(sp)->dep_map); > return retval; > } > diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c This is the old version, which is there for fast recovery if the new scalable version fails. Which it has not recently, so it is likely going away in v4.13. So kernel/rcu/srcutree.c and kernel/rcu/srcutiny.c need similar changes. And yes, there is some push for kernel/rcu/srcutiny.c to go away as well, along with kernel/rcu/rcutiny.c, but in v4.12 they are still here. > index 584d8a983883..dea03614263f 100644 > --- a/kernel/rcu/srcu.c > +++ b/kernel/rcu/srcu.c > @@ -263,7 +263,7 @@ void cleanup_srcu_struct(struct srcu_struct *sp) > > /* > * Counts the new reader in the appropriate per-CPU element of the > - * srcu_struct. Must be called from process context. > + * srcu_struct. > * Returns an index that must be passed to the matching srcu_read_unlock(). > */ > int __srcu_read_lock(struct srcu_struct *sp) > @@ -271,7 +271,7 @@ int __srcu_read_lock(struct srcu_struct *sp) > int idx; > > idx = READ_ONCE(sp->completed) & 0x1; > - __this_cpu_inc(sp->per_cpu_ref->lock_count[idx]); > + this_cpu_inc(sp->per_cpu_ref->lock_count[idx]); > smp_mb(); /* B */ /* Avoid leaking the critical section. */ > return idx; > } > @@ -281,7 +281,6 @@ int __srcu_read_lock(struct srcu_struct *sp) > * Removes the count for the old reader from the appropriate per-CPU > * element of the srcu_struct. Note that this may well be a different > * CPU than that which was incremented by the corresponding srcu_read_lock(). > - * Must be called from process context. > */ > void __srcu_read_unlock(struct srcu_struct *sp, int idx) > { > > > As said earlier, this would slow down SRCU on machines where neither fast > this_cpu_inc nor atomic_inc_relaxed are available, and local_irq_save/restore > is slower than disabling preemption. The main one is s390, which however is > already paying the price in __srcu_read_unlock. > > Any objections to the above for 4.12, with commit message explaining both the > history of __srcu_read_unlock and what this fixes in KVM? With the changes also applied to kernel/rcu/srcutree.c and kernel/srcu/srcutiny.c, given taht s390 has not complained about __srcu_read_unlock(), no objections. Thanx, Paul > Thanks, > > Paolo > > >>> > >>> Given that this is the KVM irq injection path, I am guessing that it > >>> is executed quite frequently, so that you do need a low-cost highly > >>> scalable solution. However, you only really need to exclude between > >>> process level and the various interrupts, so there are three ways > >>> that an architecture could handle this: > >>> > >>> 1. Single memory-to-memory increment instructions, as you say. > >>> > >>> 2. Atomic read-modify-write instructions, again as you say. > >>> > >>> 3. Disable interrupts across __srcu_read_lock() and __srcu_read_unlock(). > >>> > >>> Different architectures would want different approached, depending on what > >>> instructions are available and the relative speed of these instructions. > >>> It would not be hard to make this an architecture choice on the one > >>> hand or to provide an srcu_read_lock_irq() and srcu_read_unlock_irq() > >>> on the other. But I recently got some serious pushback on RCU API > >>> proliferation on the one hand and SRCU in particular on the other. So > >>> it might be good to look at some alternatives as well. > >> > >> Yeah, I liked Linu's suggestion of this_cpu_inc because it's already > >> used in srcu_read_unlock, and because it can fall back to (3) > >> automatically. Relaxed atomics weren't much of a proposal, rather a way > >> to implement this_cpu_inc on some architectures that do not support it > >> yet (notably 32-bit ARM, MIPS and PPC). > >> > >> The disadvantage would be slowing down SRCU on machines where neither > >> (1) nor (2) are possible, and local_irq_save/restore is slower than > >> disabling preemption. But I can certainly wrap ->irq_srcu lock/unlock > >> into my own helpers that take care of saving/restoring the interrupt > >> state, like > >> > >> #define kvm_irq_srcu_read_lock_irqsave(kvm, flags) > >> #define kvm_irq_srcu_read_unlock_irqrestore(kvm, flags) > >> > >> This would be very simple, confined into KVM and respectful of the SRCU > >> API requirements. > > > > I do like this approach! If it turns out to be needed in (say) four > > other places, then it might make sense for me to pull this into SRCU as > > an additional API. > > Good. On the other hand, I was >