Re: [PATCH bpf-next] bpf: Remove trace_printk_lock lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 12/13/22 1:53 PM, Jiri Olsa wrote:
On Tue, Dec 13, 2022 at 10:48:43AM -0800, Song Liu wrote:
On Tue, Dec 13, 2022 at 6:09 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote:

Both bpf_trace_printk and bpf_trace_vprintk helpers use static buffer
guarded with trace_printk_lock spin lock.

The spin lock contention causes issues with bpf programs attached to
contention_begin tracepoint [1] [2].

Andrii suggested we could get rid of the contention by using trylock,
but we could actually get rid of the spinlock completely by using
percpu buffers the same way as for bin_args in bpf_bprintf_prepare
function.

Adding 4 per cpu buffers (1k each) which should be enough for all
possible nesting contexts (normal, softirq, irq, nmi) or possible
(yet unlikely) probe within the printk helpers.

In very unlikely case we'd run out of the nesting levels the printk
will be omitted.

[1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@xxxxxxxxxxxxxx/
[2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@xxxxxxxxxxxxxx/

Reported-by: Hao Sun <sunhao.th@xxxxxxxxx>
Suggested-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
Signed-off-by: Jiri Olsa <jolsa@xxxxxxxxxx>

Maybe change to subject to 'Remove trace_printk_lock' instead
of 'Remove trace_printk_lock lock'? The 'trace_printk_lock'
should already imply 'lock'?

---
  kernel/trace/bpf_trace.c | 61 +++++++++++++++++++++++++++++++---------
  1 file changed, 47 insertions(+), 14 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 3bbd3f0c810c..b9287b3a5540 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -369,33 +369,62 @@ static const struct bpf_func_proto *bpf_get_probe_write_proto(void)
         return &bpf_probe_write_user_proto;
  }

-static DEFINE_RAW_SPINLOCK(trace_printk_lock);
-
  #define MAX_TRACE_PRINTK_VARARGS       3
  #define BPF_TRACE_PRINTK_SIZE          1024
+#define BPF_TRACE_PRINTK_LEVELS                4
+
+struct trace_printk_buf {
+       char data[BPF_TRACE_PRINTK_LEVELS][BPF_TRACE_PRINTK_SIZE];
+       int level;
+};
+static DEFINE_PER_CPU(struct trace_printk_buf, printk_buf);
+
+static void put_printk_buf(struct trace_printk_buf __percpu *buf)
+{
+       if (WARN_ON_ONCE(this_cpu_read(buf->level) == 0))
+               return;
+       this_cpu_dec(buf->level);
+       preempt_enable();
+}
+
+static bool get_printk_buf(struct trace_printk_buf __percpu *buf, char **data)
+{
+       int level;
+
+       preempt_disable();

Can we use migrate_disable() instead?

I think that should work.. while checking on that I found
comment in in include/linux/preempt.h (though dated):

I am not sure about whether migrate_disable() will work. For example,
  . task1 takes over level=0 buffer, level = 1
  . task1 yields to task2 with preemption in the same cpu
  . task2 takes over level=1 buffer, level = 2
  . task2 yields to task1 in the same cpu
  . task1 releases the buffer, level = 1
  . task1 yields to task3 in the same cpu
  . task3 takes over level=1 buffer, level = 2
    <=== we have an issue here, both task2 and task3 use level=1 buffer.


   The end goal must be to get rid of migrate_disable

but looks like both should work here and there are trade offs
for using each of them


+       level = this_cpu_inc_return(buf->level);
+       if (level > BPF_TRACE_PRINTK_LEVELS) {

Maybe add WARN_ON_ONCE() here?

ok, will add

thanks,
jirka



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux