Patch "tracing: Fix race issue between cpu buffer write and swap" has been added to the 5.4-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    tracing: Fix race issue between cpu buffer write and swap

to the 5.4-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     tracing-fix-race-issue-between-cpu-buffer-write-and-.patch
and it can be found in the queue-5.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit c5d7092e36c282cc427568558778e250ea858764
Author: Zheng Yejian <zhengyejian1@xxxxxxxxxx>
Date:   Thu Aug 31 21:27:39 2023 +0800

    tracing: Fix race issue between cpu buffer write and swap
    
    [ Upstream commit 3163f635b20e9e1fb4659e74f47918c9dddfe64e ]
    
    Warning happened in rb_end_commit() at code:
            if (RB_WARN_ON(cpu_buffer, !local_read(&cpu_buffer->committing)))
    
      WARNING: CPU: 0 PID: 139 at kernel/trace/ring_buffer.c:3142
            rb_commit+0x402/0x4a0
      Call Trace:
       ring_buffer_unlock_commit+0x42/0x250
       trace_buffer_unlock_commit_regs+0x3b/0x250
       trace_event_buffer_commit+0xe5/0x440
       trace_event_buffer_reserve+0x11c/0x150
       trace_event_raw_event_sched_switch+0x23c/0x2c0
       __traceiter_sched_switch+0x59/0x80
       __schedule+0x72b/0x1580
       schedule+0x92/0x120
       worker_thread+0xa0/0x6f0
    
    It is because the race between writing event into cpu buffer and swapping
    cpu buffer through file per_cpu/cpu0/snapshot:
    
      Write on CPU 0             Swap buffer by per_cpu/cpu0/snapshot on CPU 1
      --------                   --------
                                 tracing_snapshot_write()
                                   [...]
    
      ring_buffer_lock_reserve()
        cpu_buffer = buffer->buffers[cpu]; // 1. Suppose find 'cpu_buffer_a';
        [...]
        rb_reserve_next_event()
          [...]
    
                                   ring_buffer_swap_cpu()
                                     if (local_read(&cpu_buffer_a->committing))
                                         goto out_dec;
                                     if (local_read(&cpu_buffer_b->committing))
                                         goto out_dec;
                                     buffer_a->buffers[cpu] = cpu_buffer_b;
                                     buffer_b->buffers[cpu] = cpu_buffer_a;
                                     // 2. cpu_buffer has swapped here.
    
          rb_start_commit(cpu_buffer);
          if (unlikely(READ_ONCE(cpu_buffer->buffer)
              != buffer)) { // 3. This check passed due to 'cpu_buffer->buffer'
            [...]           //    has not changed here.
            return NULL;
          }
                                     cpu_buffer_b->buffer = buffer_a;
                                     cpu_buffer_a->buffer = buffer_b;
                                     [...]
    
          // 4. Reserve event from 'cpu_buffer_a'.
    
      ring_buffer_unlock_commit()
        [...]
        cpu_buffer = buffer->buffers[cpu]; // 5. Now find 'cpu_buffer_b' !!!
        rb_commit(cpu_buffer)
          rb_end_commit()  // 6. WARN for the wrong 'committing' state !!!
    
    Based on above analysis, we can easily reproduce by following testcase:
      ``` bash
      #!/bin/bash
    
      dmesg -n 7
      sysctl -w kernel.panic_on_warn=1
      TR=/sys/kernel/tracing
      echo 7 > ${TR}/buffer_size_kb
      echo "sched:sched_switch" > ${TR}/set_event
      while [ true ]; do
              echo 1 > ${TR}/per_cpu/cpu0/snapshot
      done &
      while [ true ]; do
              echo 1 > ${TR}/per_cpu/cpu0/snapshot
      done &
      while [ true ]; do
              echo 1 > ${TR}/per_cpu/cpu0/snapshot
      done &
      ```
    
    To fix it, IIUC, we can use smp_call_function_single() to do the swap on
    the target cpu where the buffer is located, so that above race would be
    avoided.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20230831132739.4070878-1-zhengyejian1@xxxxxxxxxx
    
    Cc: <mhiramat@xxxxxxxxxx>
    Fixes: f1affcaaa861 ("tracing: Add snapshot in the per_cpu trace directories")
    Signed-off-by: Zheng Yejian <zhengyejian1@xxxxxxxxxx>
    Signed-off-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ad0ee4de92485..dd64ce26d48dd 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6791,6 +6791,11 @@ static int tracing_snapshot_open(struct inode *inode, struct file *file)
 	return ret;
 }
 
+static void tracing_swap_cpu_buffer(void *tr)
+{
+	update_max_tr_single((struct trace_array *)tr, current, smp_processor_id());
+}
+
 static ssize_t
 tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt,
 		       loff_t *ppos)
@@ -6849,13 +6854,15 @@ tracing_snapshot_write(struct file *filp, const char __user *ubuf, size_t cnt,
 			ret = tracing_alloc_snapshot_instance(tr);
 		if (ret < 0)
 			break;
-		local_irq_disable();
 		/* Now, we're going to swap */
-		if (iter->cpu_file == RING_BUFFER_ALL_CPUS)
+		if (iter->cpu_file == RING_BUFFER_ALL_CPUS) {
+			local_irq_disable();
 			update_max_tr(tr, current, smp_processor_id(), NULL);
-		else
-			update_max_tr_single(tr, current, iter->cpu_file);
-		local_irq_enable();
+			local_irq_enable();
+		} else {
+			smp_call_function_single(iter->cpu_file, tracing_swap_cpu_buffer,
+						 (void *)tr, 1);
+		}
 		break;
 	default:
 		if (tr->allocated_snapshot) {



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux