Linus, The recursion code in the internals of the ftrace ring buffer requires using "preempt_disable_notrace". But it has been discovered that preempt_disable() is being used by this_cpu_read() in some architectures and trace_cpu_read() is part of the recursion protection of the ring buffer. The only reason this did not crash was due to the recursion protection in other parts of ftrace. But if there's a path that does some kind of function tracing without that protection, it will crash the kernel. Use the __this_cpu_*() version instead which does not add preempt_disable() or other unexpected functions to the per cpu code. Preemption is already disabled at these paths, so the __this_cpu*() version should be used anyawy. Please pull the latest trace-fixes-v4.0-rc4 tree, which can be found at: git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git trace-fixes-v4.0-rc4 Tag SHA1: 8c92b3f282f17acc4a17be91c7836e1442847270 Head SHA1: 9a22e2db723ae2c5eaf53efc40a1638620c1eb7a Steven Rostedt (1): ring-buffer: Replace this_cpu_*() with __this_cpu_*() ---- kernel/trace/ring_buffer.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) --------------------------- commit 9a22e2db723ae2c5eaf53efc40a1638620c1eb7a Author: Steven Rostedt <rostedt@xxxxxxxxxxx> Date: Tue Mar 17 10:40:38 2015 -0400 ring-buffer: Replace this_cpu_*() with __this_cpu_*() It has come to my attention that this_cpu_read/write are horrible on architectures other than x86. Worse yet, they actually disable preemption or interrupts! This caused some unexpected tracing results on ARM. 101.356868: preempt_count_add <-ring_buffer_lock_reserve 101.356870: preempt_count_sub <-ring_buffer_lock_reserve The ring_buffer_lock_reserve has recursion protection that requires accessing a per cpu variable. But since preempt_disable() is traced, it too got traced while accessing the variable that is suppose to prevent recursion like this. The generic version of this_cpu_read() and write() are: #define this_cpu_generic_read(pcp) \ ({ typeof(pcp) ret__; \ preempt_disable(); \ ret__ = *this_cpu_ptr(&(pcp)); \ preempt_enable(); \ ret__; \ }) #define this_cpu_generic_to_op(pcp, val, op) \ do { \ unsigned long flags; \ raw_local_irq_save(flags); \ *__this_cpu_ptr(&(pcp)) op val; \ raw_local_irq_restore(flags); \ } while (0) Which is unacceptable for locations that know they are within preempt disabled or interrupt disabled locations. Paul McKenney stated that __this_cpu_() versions produce much better code on other architectures than this_cpu_() does, if we know that the call is done in a preempt disabled location. I also changed the recursive_unlock() to use two local variables instead of accessing the per_cpu variable twice. Link: http://lkml.kernel.org/r/20150317114411.GE3589@xxxxxxxxxxxxxxxxxx Link: http://lkml.kernel.org/r/20150317104038.312e73d1@xxxxxxxxxxxxxxxxxx Cc: stable@xxxxxxxxxxxxxxx Acked-by: Christoph Lameter <cl@xxxxxxxxx> Reported-by: Uwe Kleine-KÃ=B6nig <u.kleine-koenig@xxxxxxxxxxxxxx> Tested-by: Uwe Kleine-KÃ=B6nig <u.kleine-koenig@xxxxxxxxxxxxxx> Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 5040d44fe5a3..922048a0f7ea 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -2679,7 +2679,7 @@ static DEFINE_PER_CPU(unsigned int, current_context); static __always_inline int trace_recursive_lock(void) { - unsigned int val = this_cpu_read(current_context); + unsigned int val = __this_cpu_read(current_context); int bit; if (in_interrupt()) { @@ -2696,18 +2696,17 @@ static __always_inline int trace_recursive_lock(void) return 1; val |= (1 << bit); - this_cpu_write(current_context, val); + __this_cpu_write(current_context, val); return 0; } static __always_inline void trace_recursive_unlock(void) { - unsigned int val = this_cpu_read(current_context); + unsigned int val = __this_cpu_read(current_context); - val--; - val &= this_cpu_read(current_context); - this_cpu_write(current_context, val); + val &= val & (val - 1); + __this_cpu_write(current_context, val); } #else -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html