On Thu, Jan 6, 2022 at 3:03 PM <sdf@xxxxxxxxxx> wrote: > > On 01/06, Hao Luo wrote: > > Bpffs is a pseudo file system that persists bpf objects. Previously > > bpf objects can only be pinned in bpffs, this patchset extends pinning > > to allow bpf objects to be pinned (or exposed) to other file systems. > > > In particular, this patchset allows pinning bpf objects in kernfs. This > > creates a new file entry in the kernfs file system and the created file > > is able to reference the bpf object. By doing so, bpf can be used to > > customize the file's operations, such as seq_show. > > > As a concrete usecase of this feature, this patchset introduces a > > simple new program type called 'bpf_view', which can be used to format > > a seq file by a kernel object's state. By pinning a bpf_view program > > into a cgroup directory, userspace is able to read the cgroup's state > > from file in a format defined by the bpf program. > > > Different from bpffs, kernfs doesn't have a callback when a kernfs node > > is freed, which is problem if we allow the kernfs node to hold an extra > > reference of the bpf object, because there is no chance to dec the > > object's refcnt. Therefore the kernfs node created by pinning doesn't > > hold reference of the bpf object. The lifetime of the kernfs node > > depends on the lifetime of the bpf object. Rather than "pinning in > > kernfs", it is "exposing to kernfs". We require the bpf object to be > > pinned in bpffs first before it can be pinned in kernfs. When the > > object is unpinned from bpffs, their kernfs nodes will be removed > > automatically. This somehow treats a pinned bpf object as a persistent > > "device". > > > We rely on fsnotify to monitor the inode events in bpffs. A new function > > bpf_watch_inode() is introduced. It allows registering a callback > > function at inode destruction. For the kernfs case, a callback that > > removes kernfs node is registered at the destruction of bpffs inodes. > > For other file systems such as sockfs, bpf_watch_inode() can monitor the > > destruction of sockfs inodes and the created file entry can hold the bpf > > object's reference. In this case, it is truly "pinning". > > > File operations other than seq_show can also be implemented using bpf. > > For example, bpf may be of help for .poll and .mmap in kernfs. > > This looks awesome! > > One thing I don't understand is: why did go through the pinning > interface VS regular attach/detach? IOW, why not allow regular > sys_bpf(BPF_PROG_ATTACH, prog_id, cgroup_id) and attach to the cgroup > (which, in turn, creates the kernfs nodes). Seems like this way you can drop > the requirement on the object being pinned in the bpffs first? Thanks Stan. Yeah, the attach/detach approach is definitely another option. IIUC, in comparison to pinning, does attach/detach only work for cgroups? Pinning may be used on other file systems, sockfs, sysfs or resctrl. But I don't know whether this generality is welcome and implementing seq_show is the only concrete use case I can think of right now. If people think the ability of creating files in other subsystems is not good, I'd be happy to take a look at the attach/detach approach and that may be the right way.