Re: [RFC PATCH bpf-next 15/16] tools/bpf: selftests: add dumper progs for bpf_map/task/task_file

Yonghong Song <yhs@xxxxxx> · Thu, 9 Apr 2020 23:41:51 -0700

On 4/9/20 8:33 PM, Alexei Starovoitov wrote:
On Wed, Apr 08, 2020 at 04:25:38PM -0700, Yonghong Song wrote:
For task/file, the dumper prints out:
   $ cat /sys/kernel/bpfdump/task/file/my1
     tgid      gid       fd      file
        1        1        0 ffffffff95c97600
        1        1        1 ffffffff95c97600
        1        1        2 ffffffff95c97600
     ....
     1895     1895      255 ffffffff95c8fe00
     1932     1932        0 ffffffff95c8fe00
     1932     1932        1 ffffffff95c8fe00
     1932     1932        2 ffffffff95c8fe00
     1932     1932        3 ffffffff95c185c0
...
+SEC("dump//sys/kernel/bpfdump/task/file")
+int BPF_PROG(dump_tasks, struct task_struct *task, __u32 fd, struct file *file,
+	     struct seq_file *seq, u64 seq_num)
+{
+	static char const banner[] = "    tgid      gid       fd      file\n";
+	static char const fmt1[] = "%8d %8d";
+	static char const fmt2[] = " %8d %lx\n";
+
+	if (seq_num == 0)
+		bpf_seq_printf(seq, banner, sizeof(banner));
+
+	bpf_seq_printf(seq, fmt1, sizeof(fmt1), task->tgid, task->pid);
+	bpf_seq_printf(seq, fmt2, sizeof(fmt2), fd, (long)file->f_op);
+	return 0;
+}

I wonder what is the speed of walking all files in all tasks with an empty
program? If it's fast I can imagine a million use cases for such searching bpf
prog. Like finding which task owns particular socket. This could be a massive
feature.

With one redundant spin_lock removed it seems it will be one spin_lock per prog
invocation? May be eventually it can be amortized within seq_file iterating
logic. Would be really awesome if the cost is just refcnt ++/-- per call and
rcu_read_lock.

The main seq_read() loop is below:
        while (1) {
                size_t offs = m->count;
                loff_t pos = m->index;

                p = m->op->next(m, p, &m->index);
                if (pos == m->index)
                        /* Buggy ->next function */
                        m->index++;
                if (!p || IS_ERR(p)) {
                        err = PTR_ERR(p);
                        break;
                }
                if (m->count >= size)
                        break;
                err = m->op->show(m, p);
                if (seq_has_overflowed(m) || err) {
                        m->count = offs;
                        if (likely(err <= 0))
                                break;
                }
        }

If we remove the spin_lock() as in another email comment,
we won't have spin_lock() in seq_ops->next() function, only
refcnt ++/-- and rcu_read_{lock, unlock}s. The seq_ops->show() does
not have any spin_lock() either.

I have not got time to do a perf measurement yet.
Will do in the next revision.