Re: [PATCH] ring-buffer: Add barrire in rb_move_tail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 01, 2022 at 11:13:20AM -0400, Steven Rostedt wrote:
> On Tue, 30 Aug 2022 20:08:54 +0800
> "Jiazi.Li" <jqqlijiazi@xxxxxxxxx> wrote:
> 
> > The following crash issue was encountered several times when read
> > trace_pipe:
> > 
> > 1. in ring_buffer_event_data, found event's type is bigger than
> > RINGBUF_TYPE_DATA_TYPE_LEN_MAX, print WARNING:
> > [249033.677658]WARNING: CPU: 1 PID: 9975 at ring_buffer_event_data+0x38/0x48
> > 
> > 2. trace_read_pipe print this event resulting in crash issue:
> > [249033.806200] die+0xac/0x37c
> > [249033.806891] die_kernel_fault+0x60/0x70
> > [249033.807710] die_kernel_fault+0x0/0x70
> > [249033.808518] do_bad_area+0x30/0x130
> > [249033.809296] do_translation_fault+0x34/0x40
> > [249033.810158] do_mem_abort+0x4c/0xf8
> > [249033.810935] el1_da+0x1c/0xac
> > [249033.811647] string+0x38/0x74
> > [249033.812357] vsnprintf+0x430/0x65c
> > [249033.813122] seq_buf_vprintf+0x64/0xcc
> > [249033.813934] trace_seq_printf+0x70/0xd8
> > [249033.814755] trace_raw_output_ipi_handler+0x40/0x54
> > [249033.815705] print_trace_line+0x3ec/0x424
> > [249033.816547] tracing_read_pipe+0x28c/0x3ec
> > [249033.817402] __vfs_read+0x44/0x13c
> > [249033.818166] vfs_read+0xb8/0x150
> > [249033.818908] ksys_read+0x6c/0xcc
> > [249033.819651] __arm64_sys_read+0x18/0x20
> > [249033.820470] el0_svc_common+0x98/0x160
> > [249033.821278] el0_svc_handler+0x60/0x78
> > [249033.822086] el0_svc+0x8/0x300
> > 
> > ipi event is not enable in my device, the event tracing_read_pipe
> 
> The ipi event is *not* enabled? So you are saying that it translated the
> extra data at the end that was suppose to be PADDING?
> 

Yes, in my tracing instance, I only enable four events:
signal_generate, signal_deliver, sched_process_fork and
sched_process_exit.
I also not enable any filter on these events.
Since these four events not trigger frequently, so reader comsume
speed is bigger than event write, and there is no overwrite.

Reader mistook this extra data to ipi_entry event.

> > want to print is a PADDING event generated in rb_move_tail.
> > array[0] of this event is 0x18, ipi entry trace event's type value
> > is 0x18 too in my device.
> > so trace_read_pipe call trace_raw_output_ipi_handler to print this
> > event, eventually, system crash because illegal pointers.
> > 
> > In ring_buffer_peek, PADDING type event will not return to caller.
> > So at least when ring_buffer_peek is called, this event has not
> > been set to PADDING.
> > 
> > My reader task poll on trace_pipe and other files, wakeup
> > by other files and found that trace ring_buffer is not empty
> > because of read != commit.
> > But maybe because of cpu out-of-order execution, new event has not
> > been set to PADDING type, and leads to above issue.
> > 
> > Can we fix this issue by add a barrier between rb_reset_tail
> > and rb_end_commit?
> > Or are there other bugs that cause this issue?
> 
> I rather not add a barrier in the writer because that will slow down the
> tracing.
> 
> > 
> > Signed-off-by: Jiazi.Li <jiazi.li@xxxxxxxxxxxxx>
> > ---
> >  kernel/trace/ring_buffer.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > index d59b6a328b7f..20f3ef839467 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -2717,6 +2717,11 @@ rb_move_tail(struct ring_buffer_per_cpu *cpu_buffer,
> >  
> >  	rb_reset_tail(cpu_buffer, tail, info);
> >  
> > +	/*
> > +	 * Add barrier to make sure tail event be set to
> > +	 * PADDING type before update commit.
> > +	 */
> > +	smp_mb();
> 
> Hmm, I'm thinking this may be a bigger race than a memory barrier. I'm
> thinking that the reader has full access to this information at this
> moment, as soon as the tail page moved forward. I unfortunately do not have
> the time right now to do a full investigation, but thank you for reporting
> this.
> 
> I think your mb() solution is just making the race window smaller and not
> actually fixing the race.
> 
> -- Steve
> 

Yes, I'm thinking this may be a race issue too.
But I didn't find race point through code review.
I have tried the following two methods to investigation this issue:
1. add log in rb_reset_tail, ring_buffer_peek, rb_per_cpu_empty
2. do not add log in these functions, just get timestamp and store to
ring_buffer struct for debug.
But All these changes make issue no longer reproduce, maybe because
these changes also make race window smaller.
Do you have any suggestions?

We now have found more than 20 similar issues, all are reader peek a
PADDING event.
Compared with ordinary data event, the time between write
RING_BUF_PADDING to memory and update commit is very short, because
rb_reset_tail and rb_end_commit are inline functions.
We do not found reader peek a ordinary data event with expired data,
So I think maybe add a barrier could fix this issue.

> >  	/* Commit what we have for now. */
> >  	rb_end_commit(cpu_buffer);
> >  	/* rb_end_commit() decs committing */
> 



[Index of Archives]     [Linux USB Development]     [Linux USB Development]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux