Re: [RFC PATCH bpf-next 03/16] bpf: provide a way for targets to register themselves

Yonghong Song <yhs@xxxxxx> · Wed, 15 Apr 2020 15:57:03 -0700

On 4/10/20 3:18 PM, Andrii Nakryiko wrote:
On Wed, Apr 8, 2020 at 4:26 PM Yonghong Song <yhs@xxxxxx> wrote:

Here, the target refers to a particular data structure
inside the kernel we want to dump. For example, it
can be all task_structs in the current pid namespace,
or it could be all open files for all task_structs
in the current pid namespace.

Each target is identified with the following information:
    target_rel_path   <=== relative path to /sys/kernel/bpfdump
    target_proto      <=== kernel func proto which represents
                           bpf program signature for this target
    seq_ops           <=== seq_ops for seq_file operations
    seq_priv_size     <=== seq_file private data size
    target_feature    <=== target specific feature which needs
                           handling outside seq_ops.

It's not clear what "feature" stands for here... Is this just a sort
of private_data passed through to dumper?

The target relative path is a relative directory to /sys/kernel/bpfdump/.
For example, it could be:
    task                  <=== all tasks
    task/file             <=== all open files under all tasks
    ipv6_route            <=== all ipv6_routes
    tcp6/sk_local_storage <=== all tcp6 socket local storages
    foo/bar/tar           <=== all tar's in bar in foo

^^ this seems useful, but I don't think code as is supports more than 2 levels?

The "target_feature" is mostly used for reusing existing seq_ops.
For example, for /proc/net/<> stats, the "net" namespace is often
stored in file private data. The target_feature enables bpf based
dumper to set "net" properly for itself before calling shared
seq_ops.

bpf_dump_reg_target() is implemented so targets
can register themselves. Currently, module is not
supported, so there is no bpf_dump_unreg_target().
The main reason is that BTF is not available for modules
yet.

Since target might call bpf_dump_reg_target() before
bpfdump mount point is created, __bpfdump_init()
may be called in bpf_dump_reg_target() as well.

The file-based dumpers will be regular files under
the specific target directory. For example,
    task/my1      <=== dumper "my1" iterates through all tasks
    task/file/my2 <=== dumper "my2" iterates through all open files
                       under all tasks

Signed-off-by: Yonghong Song <yhs@xxxxxx>
---
  include/linux/bpf.h |   4 +
  kernel/bpf/dump.c   | 190 +++++++++++++++++++++++++++++++++++++++++++-
  2 files changed, 193 insertions(+), 1 deletion(-)

+

[...]

+       if (S_ISDIR(mode)) {
+               inode->i_op = i_ops;
+               inode->i_fop = f_ops;
+               inc_nlink(inode);
+               inc_nlink(dir);
+       } else {
+               inode->i_fop = f_ops;
+       }
+
+       d_instantiate(dentry, inode);
+       dget(dentry);

lookup_one_len already bumped refcount, why the second time here?

This is due to artifact in security/inode.c:

void securityfs_remove(struct dentry *dentry)
{
        struct inode *dir;

        if (!dentry || IS_ERR(dentry))
                return;

        dir = d_inode(dentry->d_parent);
        inode_lock(dir);
        if (simple_positive(dentry)) {
                if (d_is_dir(dentry))
                        simple_rmdir(dir, dentry);
                else
                        simple_unlink(dir, dentry);
                dput(dentry);
        }
        inode_unlock(dir);
        simple_release_fs(&mount, &mount_count);
}
EXPORT_SYMBOL_GPL(securityfs_remove);

I did not implement bpfdumpfs_remove like the above.
I just use simple_unlink so I indeed do not need the above dget().
I have removed it in RFC v2. Tested it and it works fine.

I think we may not need that additional reference either in
security/inode.c.

+       inode_unlock(dir);
+       return dentry;
+
+dentry_put:
+       dput(dentry);
+       dentry = ERR_PTR(err);
+unlock:
+       inode_unlock(dir);
+       return dentry;
+}
+

[...]