On 29/05/2017 22:14, Paul E. McKenney wrote: > On Mon, May 29, 2017 at 06:38:19PM +0200, Paolo Bonzini wrote: >> On 29/05/2017 17:50, Paul E. McKenney wrote: >>> You actually are not supposed to call __srcu_read_lock() and >>> __srcu_read_unlock() from irq context. (Unless you know that you >>> interrupted some context that is guaranteed never to use the srcu_struct >>> in question or some such.) I was going to improve the SRCU doc comments, and then started digging into srcu.c history to find out why srcu_read_unlock uses this_cpu_inc. It turns out that back in 2012 __srcu_read_lock did two accesses to percpu variables, while __srcu_read_unlock did one (a decrement). Then, commit 5a41344a3d83 ("srcu: Simplify __srcu_read_unlock() via this_cpu_dec()", 2012-11-29) changed __srcu_read_unlock to use this_cpu_dec and remove the preempt_disable/preempt_enable pair. Nowadays, __srcu_read_lock only does one access to percpu variables so we could do the same as commit 5a41344a3d83 did: diff --git a/include/linux/srcu.h b/include/linux/srcu.h index 167ad8831aaf..4c1d5f7e62c4 100644 --- a/include/linux/srcu.h +++ b/include/linux/srcu.h @@ -172,9 +172,7 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp) { int retval; - preempt_disable(); retval = __srcu_read_lock(sp); - preempt_enable(); rcu_lock_acquire(&(sp)->dep_map); return retval; } diff --git a/kernel/rcu/srcu.c b/kernel/rcu/srcu.c index 584d8a983883..dea03614263f 100644 --- a/kernel/rcu/srcu.c +++ b/kernel/rcu/srcu.c @@ -263,7 +263,7 @@ void cleanup_srcu_struct(struct srcu_struct *sp) /* * Counts the new reader in the appropriate per-CPU element of the - * srcu_struct. Must be called from process context. + * srcu_struct. * Returns an index that must be passed to the matching srcu_read_unlock(). */ int __srcu_read_lock(struct srcu_struct *sp) @@ -271,7 +271,7 @@ int __srcu_read_lock(struct srcu_struct *sp) int idx; idx = READ_ONCE(sp->completed) & 0x1; - __this_cpu_inc(sp->per_cpu_ref->lock_count[idx]); + this_cpu_inc(sp->per_cpu_ref->lock_count[idx]); smp_mb(); /* B */ /* Avoid leaking the critical section. */ return idx; } @@ -281,7 +281,6 @@ int __srcu_read_lock(struct srcu_struct *sp) * Removes the count for the old reader from the appropriate per-CPU * element of the srcu_struct. Note that this may well be a different * CPU than that which was incremented by the corresponding srcu_read_lock(). - * Must be called from process context. */ void __srcu_read_unlock(struct srcu_struct *sp, int idx) { As said earlier, this would slow down SRCU on machines where neither fast this_cpu_inc nor atomic_inc_relaxed are available, and local_irq_save/restore is slower than disabling preemption. The main one is s390, which however is already paying the price in __srcu_read_unlock. Any objections to the above for 4.12, with commit message explaining both the history of __srcu_read_unlock and what this fixes in KVM? Thanks, Paolo >>> >>> Given that this is the KVM irq injection path, I am guessing that it >>> is executed quite frequently, so that you do need a low-cost highly >>> scalable solution. However, you only really need to exclude between >>> process level and the various interrupts, so there are three ways >>> that an architecture could handle this: >>> >>> 1. Single memory-to-memory increment instructions, as you say. >>> >>> 2. Atomic read-modify-write instructions, again as you say. >>> >>> 3. Disable interrupts across __srcu_read_lock() and __srcu_read_unlock(). >>> >>> Different architectures would want different approached, depending on what >>> instructions are available and the relative speed of these instructions. >>> It would not be hard to make this an architecture choice on the one >>> hand or to provide an srcu_read_lock_irq() and srcu_read_unlock_irq() >>> on the other. But I recently got some serious pushback on RCU API >>> proliferation on the one hand and SRCU in particular on the other. So >>> it might be good to look at some alternatives as well. >> >> Yeah, I liked Linu's suggestion of this_cpu_inc because it's already >> used in srcu_read_unlock, and because it can fall back to (3) >> automatically. Relaxed atomics weren't much of a proposal, rather a way >> to implement this_cpu_inc on some architectures that do not support it >> yet (notably 32-bit ARM, MIPS and PPC). >> >> The disadvantage would be slowing down SRCU on machines where neither >> (1) nor (2) are possible, and local_irq_save/restore is slower than >> disabling preemption. But I can certainly wrap ->irq_srcu lock/unlock >> into my own helpers that take care of saving/restoring the interrupt >> state, like >> >> #define kvm_irq_srcu_read_lock_irqsave(kvm, flags) >> #define kvm_irq_srcu_read_unlock_irqrestore(kvm, flags) >> >> This would be very simple, confined into KVM and respectful of the SRCU >> API requirements. > > I do like this approach! If it turns out to be needed in (say) four > other places, then it might make sense for me to pull this into SRCU as > an additional API. Good. On the other hand, I was