Re: [PATCH] trace: Fix race in trace_open and buffer resize call

Gaurav Kohli <gkohli@xxxxxxxxxxxxxx> · Wed, 16 Sep 2020 12:02:46 +0530






On 9/15/2020 11:43 PM, Steven Rostedt wrote:

Actually available reader lock is not helping
here(&cpu_buffer->reader_lock), So i took ring buffer mutex lock to
resolve this(this came on 4.19/5.4), in latest tip it is trace buffer
lock. Due to this i have exported api.

I'm saying, why don't you take the buffer->mutex in the
ring_buffer_reset_online_cpus() function? And remove all the protection in
tracing_reset_online_cpus()?

Yes, got your point. then we can avoid export. Actually we are seeing
issue in older kernel like 4.19/4.14/5.4 and there below patch was not
present in stable branches:

ommit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by
  > avoiding synchronize_rcu for each CPU")

If you mark this patch for stable, you can add:

Depends-on: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU")


Thanks Steven, Yes this needs to be back ported. I have tried this in 
5.4 but this need more patches like
13292494379f92f532de71b31a54018336adc589
tracing: Make struct ring_buffer less ambiguous

Instead of protecting all reset, can we do it individually like below:


+++ b/kernel/trace/ring_buffer.c
@@ -4838,7 +4838,9 @@ rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer)
 static void reset_disabled_cpu_buffer(struct ring_buffer_per_cpu 
*cpu_buffer)
 {
        unsigned long flags;
+       struct trace_buffer *buffer = cpu_buffer->buffer;

+       mutex_lock(&buffer->mutex);
        raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);

        if (RB_WARN_ON(cpu_buffer, local_read(&cpu_buffer->committing)))
@@ -4852,6 +4854,7 @@ static void reset_disabled_cpu_buffer(struct 
ring_buffer_per_cpu *cpu_buffer)

  out:
        raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+       mutex_unlock(&buffer->mutex);
 }

Please let me know, if above looks good, we will do testing with this.
And this we can directly use in older kernel as well in 
ring_buffer_reset_cpu.


Actually i have also thought to take mutex lock in ring_buffer_reset_cpu
while doing individual cpu reset, but this could cause another problem:

Hmm, I think we should also take the buffer lock in the reset_cpu() call
too, and modify tracing_reset_cpu() the same way.


if we take above patch, then this is not required.
Please let me know for the approach.

Different cpu buffer may have different state, so i have taken lock in
tracing_reset_online_cpus.

Why would different states be an issue in synchronizing?

-- Steve


Yes, this should not be problem.

void tracing_reset_online_cpus(struct array_buffer *buf)
{
	struct trace_buffer *buffer = buf->buffer;

	if (!buffer)
		return;

	buf->time_start = buffer_ftrace_now(buf, buf->cpu);

	ring_buffer_reset_online_cpus(buffer);
}

The reset_online_cpus() is already doing the synchronization, we don't need
to do it twice.

I believe commit b23d7a5f4a07 ("ring-buffer: speed up buffer resets by
avoiding synchronize_rcu for each CPU") made the synchronization in
tracing_reset_online_cpus() obsolete.

-- Steve
   

Yes, with above patch no need to take lock in tracing_reset_online_cpus.


--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.