Re: [PATCH] trace: Fix race in trace_open and buffer resize call

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





Hi Steven,
thanks for reply.

On 9/14/2020 9:49 PM, Steven Rostedt wrote:
> On Mon, 14 Sep 2020 10:00:50 +0530
> Gaurav Kohli <gkohli@xxxxxxxxxxxxxx> wrote:
>
>> Hi Steven,
>>
>> Please let us know, if below change looks good.
>> Or let us know some other way to solve this.
>>
>> Thanks,
>> Gaurav
>>
>>
>
> Hmm, for some reason, I don't see this in my INBOX, but it shows up in my
> LKML folder. :-/
>
>


>>> +void ring_buffer_mutex_release(struct trace_buffer *buffer)
>>> +{
>>> +    mutex_unlock(&buffer->mutex);
>>> +}
>>> +EXPORT_SYMBOL_GPL(ring_buffer_mutex_release);
>
> I really do not like to export these.
>

Actually available reader lock is not helping here(&cpu_buffer->reader_lock), So i took ring buffer mutex lock to resolve this(this came on 4.19/5.4), in latest tip it is trace buffer lock. Due to this i have exported api.
>>> +/**
>>>     * ring_buffer_record_off - stop all writes into the buffer
>>>     * @buffer: The ring buffer to stop writes to.
>>>     *
>>> @@ -4918,6 +4937,8 @@ void ring_buffer_reset(struct trace_buffer *buffer)
>>>        struct ring_buffer_per_cpu *cpu_buffer;
>>>        int cpu;
>>>    +    /* prevent another thread from changing buffer sizes */
>>> +    mutex_lock(&buffer->mutex);
>>>        for_each_buffer_cpu(buffer, cpu) {
>>>            cpu_buffer = buffer->buffers[cpu];
>>> @@ -4936,6 +4957,7 @@ void ring_buffer_reset(struct trace_buffer *buffer)
>>>            atomic_dec(&cpu_buffer->record_disabled);
>>>            atomic_dec(&cpu_buffer->resize_disabled);
>>>        }
>>> +    mutex_unlock(&buffer->mutex);
>>>    }
>>>    EXPORT_SYMBOL_GPL(ring_buffer_reset);
>>>    diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>>> index f40d850..392e9aa 100644
>>> --- a/kernel/trace/trace.c
>>> +++ b/kernel/trace/trace.c
>>> @@ -2006,6 +2006,8 @@ void tracing_reset_online_cpus(struct array_buffer *buf)
>>>        if (!buffer)
>>>            return;
>>>    +    ring_buffer_mutex_acquire(buffer);
>>> +
>>>        ring_buffer_record_disable(buffer);
>
> Hmm, why do we disable here as it gets disabled again in the call to
> ring_buffer_reset_online_cpus()? Perhaps we don't need to disable the
You mean cpu_buffer->reader_lock in reset_disabled_cpu_buffer?
Actually reader lock is already there but this is not helping if tracing_open and ring_buffer_resize are running parallel on different cpus.

We are seeing below race mainly during removal of extra pages:

                                            ring_buffer_resize
                                           //Below portion of code
                                           //not under any lock
                                            nr_pages_to_update < 0
                                           init_list_head(new_pages)
                                           rb_update_pages


ring_buffer_resize
tracing_open
tracing_reset_online_cpus
ring_buffer_reset_cpu
                                          cpu_buffer_reset done
                                          //now lock started

                                          warning(nr_removed)

We are seeing cases like cpu buffer got reset due to tracing open in other call, and then seeing issue in rb_remove_pages.

Similar case can come during rb_insert_pages as well:

rb_insert_pages(struct ring_buffer_per_cpu *cpu_buffer)
{
        struct list_head *pages = &cpu_buffer->new_pages;
        int retries, success;
//before lock cpu buffer may get reset in another cpu, due to which we are seeing infinite loop cases as new_pages pointer got reset in rb_reset_cpu.

        raw_spin_lock_irq(&cpu_buffer->reader_lock);

> buffer here. The only difference is that we have:
>
>   buf->time_start = buffer_ftrace_now(buf, buf->cpu);
>
> And that the above disables the entire buffer, whereas the reset only
> resets individual ones.
>
> But I don't think that will make any difference.
>
> -- Steve
>


--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux