Re: [PATCH 1/1] tracing: fix bug in rb_per_cpu_empty() that might cause deadloop.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 21 Jul 2021 09:08:55 +0800 Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
 > On Tue, 20 Jul 2021 11:35:56 -0700 
 > Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxx> wrote: 
 >  
 > > On Tue, Jul 20, 2021 at 2:04 AM Haoran Luo <www@xxxxxxxxxxxxxx> wrote: 
 > > > 
 > > >         return reader->read == rb_page_commit(reader) && 
 > > >                 (commit == reader || 
 > > >                  (commit == head && 
 > > > -                 head->read == rb_page_commit(commit))); 
 > > > +                 (rb_page_commit(commit) == 0 || 
 > > > +                  head->read == rb_page_commit(commit)))); 
 > > >  } 
 > > 
 > > Can we please re-write that whole thing to be legible? 
 > > 
 > > That conditional was complicated before. It's worse now. 
 >  
 > I agree. I always cringe when I write code like this. For some reason, 
 > I write code I dislike over adding multiple returns. Not sure why I do 
 > that. Probably some childhood trauma. 
 >  
 > I'll add the comments below, as it looks correct. 
 >  
 > > 
 > > For example, just splitting it up would help: 
 > > 
 >  
 >     /* 
 >      * If what has been read on the reader page is not equal 
 >      * to the content on the reader page, then there's more 
 >      * data (not empty). 
 >      */ 
 >  
 > >         if (reader->read != rb_page_commit(reader)) 
 > >                 return false; 
 > > 
 >  
 >     /* 
 >      * The last thing written is on the reader page, and the above 
 >      * statement already shows everything written was read. 
 >      * (empty buffer) 
 >      */ 
 >  
 > >         if (commit == reader) 
 > >                 return true; 
 > > 
 >  
 >     /* 
 >      * The last page written does not equal the last page read, 
 >      * thus there's more data (not empty). 
 >      */ 
 >  
 > >         if (commit != head) 
 > >                 return false; 
 > > 
 >  
 > So, looking at this, I see that the bug is not here. head->read should 
 > never be non-zero. The issue is that we missed resetting "read" to zero 
 > before adding it back to the ring buffer. 
 >  
 >  
 > >         return !rb_page_commit(commit) || 
 > >                 commit->read == rb_page_commit(commit); 
 >  
 > Really, the original code worked because head->read is expected to be 
 > zero. And looking at the code more, it should even work with: 
 >  
 >     /* 
 >      * With the commit == head page (tail == head), and nothing on 
 >      * the commit page there's nothing in the buffer. 
 >      */ 
 >     if (!rb_page_commit(commit)) 
 >         return true; 
 >  
 > > 
 > > but adding comments to each conditional case would also be good. 
 > > 
 > > Note: I checked "head->read" to "commit->read" in that last 
 > > conditional, because we had _just_ verified that "commit" and "head" 
 > > are the same, and it seems to be more logical to check "commit->read" 
 > > with "rb_page_commit(commit)". 
 >  
 > I believe I kept it as "head" to re-enforce that the two are the same. 
 >  
 >  
 > > 
 > > (And somebody should check that I split it right - I'm just doing this 
 > > in the MUA without knowing the code: I hate conditionals that are 
 > > cross multiple lines like this, and I hate them particularly when they 
 > > turn out to be buggy, so when you fix a complex conditional, that's a 
 > > BIG sign to me that the problem was that the conditional was too 
 > > complex to begin with). 
 > > 
 > > Btw, that "rb_page_commit()" forces a new atomic read each time, so 
 > > the "rb_page_commit(commit)" value should probably be cached as a 
 > > variable before it's used twice. 
 > > 
 >  
 > Well, the double read of rb_page_commit() is not needed as the bug is 
 > that read is non zero. And I'm not sure this patch even fixes the bug, 
 > because read is not updated when commit is, thus read is some random 
 > number that could cause this to return a false positive. "read" needed 
 > to be reset when the page was added back to the ring buffer, and it 
 > appears that it was missed (ever though all the other fields were reset). 
 >  
 > Although this patch is wrong, I'm not against a clean up patch to make 
 > the code easier to read. 
 >  
 > I think the real fix is below (not even compiled tested). 
 >  
 > -- Steve 
 >  
 > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c 
 > index d1463eac11a3..0b9ce927e95f 100644 
 > --- a/kernel/trace/ring_buffer.c 
 > +++ b/kernel/trace/ring_buffer.c 
 > @@ -4445,6 +4445,7 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer) 
 >      local_set(&cpu_buffer->reader_page->write, 0); 
 >      local_set(&cpu_buffer->reader_page->entries, 0); 
 >      local_set(&cpu_buffer->reader_page->page->commit, 0); 
 > +    cpu_buffer->reader_page->read = 0; 
 >      cpu_buffer->reader_page->real_end = 0; 
 >  
 >  spin: 
 > 

I'll have a try and tell you the result. But I've seen the code written on

https://github.com/torvalds/linux/blob/2734d6c1b1a089fb593ef6a23d4b70903526fe0c/kernel/trace/ring_buffer.c#L4513
	cpu_buffer->reader_page->read = 0;

I though the algorithm is to reset the reader_page->read when it swaps with the
head page. And following the original algorithm the read field of a buffer page 
should be either 0 or its original value when the content to read exhaust. So I
added an extra conditional to ensure that.



[Index of Archives]     [Linux USB Development]     [Linux USB Development]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux