On Mon, Apr 24, 2023 at 9:45 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > Hi, > > On 4/16/2023 3:55 PM, Amir Goldstein wrote: > > On Tue, Feb 28, 2023 at 5:47 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > >> From time to time, new syscalls have been proposed to gain more observability > >> for file-system: > >> > >> (1) getvalues() [0]. It uses a hierarchical namespace API to gather and return > >> multiple values in single syscall. > >> (2) cachestat() [1]. It returns the cache status (e.g., number of dirty pages) > >> of a given file in a scalable way. > >> > >> All these proposals requires adding a new syscall. Here I would like to propose > >> another solution for file system observability: bpf iterator for file system > >> object. The initial idea came when I was trying to implement a filefrag-like > >> page cache tool with support for multi-order folio, so that we can know the > >> number of multi-order folios and the orders of those folios in page cache. After > >> developing a demo for it, I realized that we could use it to provide more > >> observability for file system objects. e.g., dumping the per-cpu iostat for a > >> super block [2], iterating all inodes in a super-block to dump info for > >> specific inodes (e.g., unlinked but pinned inode), or displaying the flags of a > >> specific mount. > >> > >> The BPF iterator was introduced in v5.8 [3] to support flexible content dumping > >> for kernel objects. It works by creating bpf iterator file [4], which is a > >> seq-like read-only file, and the content of the bpf iterator file is determined > >> by a previously loaded bpf program, so userspace can read the bpf iterator file > >> to get the information it needs. However there are some unresolved issues: > >> (1) The privilege. > >> Loading the bpf program requires CAP_ADMIN or CAP_BPF. This means that the > >> observability will be available to the privileged process. Maybe we can load the > >> bpf program through a privileged process and make the bpf iterator file being > >> readable for normal users. > >> (2) Prevent pinning the super-block > >> In the current naive implementation, the bpf iterator simply pins the > >> super-block of the passed fd and prevents the super-block from being destroyed. > >> Perhaps fs-pin is a better choice, so the bpf iterator can be deactivated after > >> the filesystem is umounted. > >> > >> I hope to send out an RFC soon before LSF/MM/BPF for further discussion. > > Hi Hou, > > > > IIUC, there is not much value in making this a cross track session. > > Seems like an FS track session that has not much to do with BPF > > development. > > > > Am I understanding correctly or are there any cross subsystem > > interactions that need to be discussed? > Yes. Although the patchset for file-system iterator is still not ready, but I > think the BPF mechanisms for file-system iterator is ready, so a cross track > session maybe unnecessary. > > > > Perhaps we can join you as co-speaker for Miklos' traditional > > "fsinfo" session? > Thanks. I am glad to be a co-speaker for fsinfo session. All right. I put you down as a co-speaker with Miklos on the fsinfo session. Thanks, Amir.