Re: [LSF/MM/BPF TOPIC] bpf iterator for file-system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Feb 27, 2023, at 7:30 PM, Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> 
> From time to time, new syscalls have been proposed to gain more observability
> for file-system:
> 
> (1) getvalues() [0]. It uses a hierarchical namespace API to gather and return
> multiple values in single syscall.
> (2) cachestat() [1].  It returns the cache status (e.g., number of dirty pages)
> of a given file in a scalable way.
> 
> All these proposals requires adding a new syscall. Here I would like to propose
> another solution for file system observability: bpf iterator for file system
> object. The initial idea came when I was trying to implement a filefrag-like
> page cache tool with support for multi-order folio, so that we can know the
> number of multi-order folios and the orders of those folios in page cache. After
> developing a demo for it, I realized that we could use it to provide more
> observability for file system objects. e.g., dumping the per-cpu iostat for a
> super block [2],  iterating all inodes in a super-block to dump info for
> specific inodes (e.g., unlinked but pinned inode), or displaying the flags of a
> specific mount.
> 

Sounds like interesting suggestion to me. :) Potentially, it could have more
applications.

> The BPF iterator was introduced in v5.8 [3] to support flexible content dumping
> for kernel objects. It works by creating bpf iterator file [4], which is a
> seq-like read-only file, and the content of the bpf iterator file is determined
> by a previously loaded bpf program, so userspace can read the bpf iterator file
> to get the information it needs. However there are some unresolved issues:
> (1) The privilege.
> Loading the bpf program requires CAP_ADMIN or CAP_BPF. This means that the
> observability will be available to the privileged process. Maybe we can load the
> bpf program through a privileged process and make the bpf iterator file being
> readable for normal users.
> (2) Prevent pinning the super-block
> In the current naive implementation, the bpf iterator simply pins the
> super-block of the passed fd and prevents the super-block from being destroyed.
> Perhaps fs-pin is a better choice, so the bpf iterator can be deactivated after
> the filesystem is umounted.
> 
> I hope to send out an RFC soon before LSF/MM/BPF for further discussion.
> 

It will be good to see the patchset. :)

Thanks,
Slava.

> [0]:
> https://lore.kernel.org/linux-fsdevel/YnEeuw6fd1A8usjj@xxxxxxxxxxxxxxxxxxxxxxxxx/
> [1]: https://lore.kernel.org/linux-mm/20230219073318.366189-1-nphamcs@xxxxxxxxx/
> [2]:
> https://lore.kernel.org/linux-fsdevel/CAJfpegsCKEx41KA1S2QJ9gX9BEBG4_d8igA0DT66GFH2ZanspA@xxxxxxxxxxxxxx/
> [3]: https://lore.kernel.org/bpf/20200509175859.2474608-1-yhs@xxxxxx/
> [4]: https://docs.kernel.org/bpf/bpf_iterators.html
> 





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux