Patch "ring-buffer: Only update pages_touched when a new page is touched" has been added to the 5.4-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Wed, 17 Apr 2024 13:18:26 -0400

This is a note to let you know that I've just added the patch titled

    ring-buffer: Only update pages_touched when a new page is touched

to the 5.4-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     ring-buffer-only-update-pages_touched-when-a-new-pag.patch
and it can be found in the queue-5.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 54668a2133aba01b97c8945fa5eb0483c99ad8fc
Author: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
Date:   Tue Apr 9 15:13:09 2024 -0400

    ring-buffer: Only update pages_touched when a new page is touched
    
    [ Upstream commit ffe3986fece696cf65e0ef99e74c75f848be8e30 ]
    
    The "buffer_percent" logic that is used by the ring buffer splice code to
    only wake up the tasks when there's no data after the buffer is filled to
    the percentage of the "buffer_percent" file is dependent on three
    variables that determine the amount of data that is in the ring buffer:
    
     1) pages_read - incremented whenever a new sub-buffer is consumed
     2) pages_lost - incremented every time a writer overwrites a sub-buffer
     3) pages_touched - incremented when a write goes to a new sub-buffer
    
    The percentage is the calculation of:
    
      (pages_touched - (pages_lost + pages_read)) / nr_pages
    
    Basically, the amount of data is the total number of sub-bufs that have been
    touched, minus the number of sub-bufs lost and sub-bufs consumed. This is
    divided by the total count to give the buffer percentage. When the
    percentage is greater than the value in the "buffer_percent" file, it
    wakes up splice readers waiting for that amount.
    
    It was observed that over time, the amount read from the splice was
    constantly decreasing the longer the trace was running. That is, if one
    asked for 60%, it would read over 60% when it first starts tracing, but
    then it would be woken up at under 60% and would slowly decrease the
    amount of data read after being woken up, where the amount becomes much
    less than the buffer percent.
    
    This was due to an accounting of the pages_touched incrementation. This
    value is incremented whenever a writer transfers to a new sub-buffer. But
    the place where it was incremented was incorrect. If a writer overflowed
    the current sub-buffer it would go to the next one. If it gets preempted
    by an interrupt at that time, and the interrupt performs a trace, it too
    will end up going to the next sub-buffer. But only one should increment
    the counter. Unfortunately, that was not the case.
    
    Change the cmpxchg() that does the real switch of the tail-page into a
    try_cmpxchg(), and on success, perform the increment of pages_touched. This
    will only increment the counter once for when the writer moves to a new
    sub-buffer, and not when there's a race and is incremented for when a
    writer and its preempting writer both move to the same new sub-buffer.
    
    Link: https://lore.kernel.org/linux-trace-kernel/20240409151309.0d0e5056@xxxxxxxxxxxxxxxxxx
    
    Cc: stable@xxxxxxxxxxxxxxx
    Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
    Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
    Acked-by: Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>
    Signed-off-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index d2dba546fbbe1..1f0ef428b2f1c 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1163,7 +1163,6 @@ static void rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer,
 	old_write = local_add_return(RB_WRITE_INTCNT, &next_page->write);
 	old_entries = local_add_return(RB_WRITE_INTCNT, &next_page->entries);
 
-	local_inc(&cpu_buffer->pages_touched);
 	/*
 	 * Just make sure we have seen our old_write and synchronize
 	 * with any interrupts that come in.
@@ -1200,8 +1199,9 @@ static void rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer,
 		 */
 		local_set(&next_page->page->commit, 0);
 
-		/* Again, either we update tail_page or an interrupt does */
-		(void)cmpxchg(&cpu_buffer->tail_page, tail_page, next_page);
+		/* Either we update tail_page or an interrupt does */
+		if (try_cmpxchg(&cpu_buffer->tail_page, &tail_page, next_page))
+			local_inc(&cpu_buffer->pages_touched);
 	}
 }