Patch "ring-buffer: Fix waking up ring buffer readers" has been added to the 6.7-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Wed, 27 Mar 2024 07:25:58 -0400

This is a note to let you know that I've just added the patch titled

    ring-buffer: Fix waking up ring buffer readers

to the 6.7-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     ring-buffer-fix-waking-up-ring-buffer-readers.patch
and it can be found in the queue-6.7 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit af99fb293a8d7d4a355694d64a58c48a83130767
Author: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
Date:   Fri Mar 8 15:24:03 2024 -0500

    ring-buffer: Fix waking up ring buffer readers
    
    [ Upstream commit b3594573681b53316ec0365332681a30463edfd6 ]
    
    A task can wait on a ring buffer for when it fills up to a specific
    watermark. The writer will check the minimum watermark that waiters are
    waiting for and if the ring buffer is past that, it will wake up all the
    waiters.
    
    The waiters are in a wait loop, and will first check if a signal is
    pending and then check if the ring buffer is at the desired level where it
    should break out of the loop.
    
    If a file that uses a ring buffer closes, and there's threads waiting on
    the ring buffer, it needs to wake up those threads. To do this, a
    "wait_index" was used.
    
    Before entering the wait loop, the waiter will read the wait_index. On
    wakeup, it will check if the wait_index is different than when it entered
    the loop, and will exit the loop if it is. The waker will only need to
    update the wait_index before waking up the waiters.
    
    This had a couple of bugs. One trivial one and one broken by design.
    
    The trivial bug was that the waiter checked the wait_index after the
    schedule() call. It had to be checked between the prepare_to_wait() and
    the schedule() which it was not.
    
    The main bug is that the first check to set the default wait_index will
    always be outside the prepare_to_wait() and the schedule(). That's because
    the ring_buffer_wait() doesn't have enough context to know if it should
    break out of the loop.
    
    The loop itself is not needed, because all the callers to the
    ring_buffer_wait() also has their own loop, as the callers have a better
    sense of what the context is to decide whether to break out of the loop
    or not.
    
    Just have the ring_buffer_wait() block once, and if it gets woken up, exit
    the function and let the callers decide what to do next.
    
    Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNSRZfg@xxxxxxxxxxxxxx/
    Link: https://lore.kernel.org/linux-trace-kernel/20240308202431.792933613@xxxxxxxxxxx
    
    Cc: stable@xxxxxxxxxxxxxxx
    Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
    Cc: Mark Rutland <mark.rutland@xxxxxxx>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
    Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
    Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
    Cc: linke li <lilinke99@xxxxxx>
    Cc: Rabin Vincent <rabin@xxxxxx>
    Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
    Signed-off-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
    Stable-dep-of: 761d9473e27f ("ring-buffer: Do not set shortest_full when full target is hit")
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 6fa67c297e8fa..840301b04f31f 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -412,7 +412,6 @@ struct rb_irq_work {
 	struct irq_work			work;
 	wait_queue_head_t		waiters;
 	wait_queue_head_t		full_waiters;
-	long				wait_index;
 	bool				waiters_pending;
 	bool				full_waiters_pending;
 	bool				wakeup_full;
@@ -945,14 +944,40 @@ void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu)
 		rbwork = &cpu_buffer->irq_work;
 	}
 
-	rbwork->wait_index++;
-	/* make sure the waiters see the new index */
-	smp_wmb();
-
 	/* This can be called in any context */
 	irq_work_queue(&rbwork->work);
 }
 
+static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full)
+{
+	struct ring_buffer_per_cpu *cpu_buffer;
+	bool ret = false;
+
+	/* Reads of all CPUs always waits for any data */
+	if (cpu == RING_BUFFER_ALL_CPUS)
+		return !ring_buffer_empty(buffer);
+
+	cpu_buffer = buffer->buffers[cpu];
+
+	if (!ring_buffer_empty_cpu(buffer, cpu)) {
+		unsigned long flags;
+		bool pagebusy;
+
+		if (!full)
+			return true;
+
+		raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+		pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
+		ret = !pagebusy && full_hit(buffer, cpu, full);
+
+		if (!cpu_buffer->shortest_full ||
+		    cpu_buffer->shortest_full > full)
+			cpu_buffer->shortest_full = full;
+		raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+	}
+	return ret;
+}
+
 /**
  * ring_buffer_wait - wait for input to the ring buffer
  * @buffer: buffer to wait on
@@ -968,7 +993,6 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
 	struct ring_buffer_per_cpu *cpu_buffer;
 	DEFINE_WAIT(wait);
 	struct rb_irq_work *work;
-	long wait_index;
 	int ret = 0;
 
 	/*
@@ -987,81 +1011,54 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
 		work = &cpu_buffer->irq_work;
 	}
 
-	wait_index = READ_ONCE(work->wait_index);
-
-	while (true) {
-		if (full)
-			prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
-		else
-			prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
-
-		/*
-		 * The events can happen in critical sections where
-		 * checking a work queue can cause deadlocks.
-		 * After adding a task to the queue, this flag is set
-		 * only to notify events to try to wake up the queue
-		 * using irq_work.
-		 *
-		 * We don't clear it even if the buffer is no longer
-		 * empty. The flag only causes the next event to run
-		 * irq_work to do the work queue wake up. The worse
-		 * that can happen if we race with !trace_empty() is that
-		 * an event will cause an irq_work to try to wake up
-		 * an empty queue.
-		 *
-		 * There's no reason to protect this flag either, as
-		 * the work queue and irq_work logic will do the necessary
-		 * synchronization for the wake ups. The only thing
-		 * that is necessary is that the wake up happens after
-		 * a task has been queued. It's OK for spurious wake ups.
-		 */
-		if (full)
-			work->full_waiters_pending = true;
-		else
-			work->waiters_pending = true;
-
-		if (signal_pending(current)) {
-			ret = -EINTR;
-			break;
-		}
-
-		if (cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer))
-			break;
-
-		if (cpu != RING_BUFFER_ALL_CPUS &&
-		    !ring_buffer_empty_cpu(buffer, cpu)) {
-			unsigned long flags;
-			bool pagebusy;
-			bool done;
-
-			if (!full)
-				break;
-
-			raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
-			pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
-			done = !pagebusy && full_hit(buffer, cpu, full);
+	if (full)
+		prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
+	else
+		prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
 
-			if (!cpu_buffer->shortest_full ||
-			    cpu_buffer->shortest_full > full)
-				cpu_buffer->shortest_full = full;
-			raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
-			if (done)
-				break;
-		}
+	/*
+	 * The events can happen in critical sections where
+	 * checking a work queue can cause deadlocks.
+	 * After adding a task to the queue, this flag is set
+	 * only to notify events to try to wake up the queue
+	 * using irq_work.
+	 *
+	 * We don't clear it even if the buffer is no longer
+	 * empty. The flag only causes the next event to run
+	 * irq_work to do the work queue wake up. The worse
+	 * that can happen if we race with !trace_empty() is that
+	 * an event will cause an irq_work to try to wake up
+	 * an empty queue.
+	 *
+	 * There's no reason to protect this flag either, as
+	 * the work queue and irq_work logic will do the necessary
+	 * synchronization for the wake ups. The only thing
+	 * that is necessary is that the wake up happens after
+	 * a task has been queued. It's OK for spurious wake ups.
+	 */
+	if (full)
+		work->full_waiters_pending = true;
+	else
+		work->waiters_pending = true;
 
-		schedule();
+	if (rb_watermark_hit(buffer, cpu, full))
+		goto out;
 
-		/* Make sure to see the new wait index */
-		smp_rmb();
-		if (wait_index != work->wait_index)
-			break;
+	if (signal_pending(current)) {
+		ret = -EINTR;
+		goto out;
 	}
 
+	schedule();
+ out:
 	if (full)
 		finish_wait(&work->full_waiters, &wait);
 	else
 		finish_wait(&work->waiters, &wait);
 
+	if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
+		ret = -EINTR;
+
 	return ret;
 }