On Wed, Nov 6, 2024 at 2:31 PM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote: > > On 2024/11/6 21:31, Alexei Starovoitov wrote: > > On Wed, Nov 6, 2024 at 11:39 AM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote: > >> > >> This patch adds the open-coded iterator style process file iterator > >> kfuncs bpf_iter_task_file_{new,next,destroy} that iterates over all > >> files opened by the specified process. > > > > This is ok. > > > >> In addition, this patch adds bpf_iter_task_file_get_fd() getter to get > >> the file descriptor corresponding to the file in the current iteration. > > > > Unnecessary. Use CORE to read iter internal fields. > > > >> The reference to struct file acquired by the previous > >> bpf_iter_task_file_next() is released in the next > >> bpf_iter_task_file_next(), and the last reference is released in the > >> last bpf_iter_task_file_next() that returns NULL. > >> > >> In the bpf_iter_task_file_destroy(), if the iterator does not iterate to > >> the end, then the last struct file reference is released at this time. > >> > >> Signed-off-by: Juntong Deng <juntong.deng@xxxxxxxxxxx> > >> --- > >> kernel/bpf/helpers.c | 4 ++ > >> kernel/bpf/task_iter.c | 96 ++++++++++++++++++++++++++++++++++++++++++ > >> 2 files changed, 100 insertions(+) > >> > >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > >> index 395221e53832..1f0f7ca1c47a 100644 > >> --- a/kernel/bpf/helpers.c > >> +++ b/kernel/bpf/helpers.c > >> @@ -3096,6 +3096,10 @@ BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY) > >> BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED) > >> BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL) > >> BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY) > >> +BTF_ID_FLAGS(func, bpf_iter_task_file_new, KF_ITER_NEW | KF_TRUSTED_ARGS) > >> +BTF_ID_FLAGS(func, bpf_iter_task_file_next, KF_ITER_NEXT | KF_RET_NULL) > >> +BTF_ID_FLAGS(func, bpf_iter_task_file_get_fd) > >> +BTF_ID_FLAGS(func, bpf_iter_task_file_destroy, KF_ITER_DESTROY) > >> BTF_ID_FLAGS(func, bpf_dynptr_adjust) > >> BTF_ID_FLAGS(func, bpf_dynptr_is_null) > >> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) > >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c > >> index 5af9e130e500..32e15403a5a6 100644 > >> --- a/kernel/bpf/task_iter.c > >> +++ b/kernel/bpf/task_iter.c > >> @@ -1031,6 +1031,102 @@ __bpf_kfunc void bpf_iter_task_destroy(struct bpf_iter_task *it) > >> { > >> } > >> > >> +struct bpf_iter_task_file { > >> + __u64 __opaque[3]; > >> +} __aligned(8); > >> + > >> +struct bpf_iter_task_file_kern { > >> + struct task_struct *task; > >> + struct file *file; > >> + int fd; > >> +} __aligned(8); > >> + > >> +/** > >> + * bpf_iter_task_file_new() - Initialize a new task file iterator for a task, > >> + * used to iterate over all files opened by a specified task > >> + * > >> + * @it: the new bpf_iter_task_file to be created > >> + * @task: a pointer pointing to a task to be iterated over > >> + */ > >> +__bpf_kfunc int bpf_iter_task_file_new(struct bpf_iter_task_file *it, > >> + struct task_struct *task) > >> +{ > >> + struct bpf_iter_task_file_kern *kit = (void *)it; > >> + > >> + BUILD_BUG_ON(sizeof(struct bpf_iter_task_file_kern) > sizeof(struct bpf_iter_task_file)); > >> + BUILD_BUG_ON(__alignof__(struct bpf_iter_task_file_kern) != > >> + __alignof__(struct bpf_iter_task_file)); > >> + > >> + kit->task = task; > > > > This is broken, since task refcnt can drop while iter is running. > > > > Before doing any of that I'd like to see a long term path for crib. > > All these small additions are ok if they're generic and useful elsewhere. > > I'm afraid there is no path forward for crib itself though. > > > > pw-bot: cr > > Thanks for your reply. > > The long-term path of CRIB is consistent with the initial goal, adding > kfuncs to help the bpf program obtain process-related information. > > I think most of the CRIB kfuncs are generic, such as process file > iterator, skb iterator, bpf_fget_task() that gets struct file based on > file descriptor, etc. > > This is because obtaining process-related information is not a > requirement specific to checkpoint/restore scenarios, but is > required in other scenarios as well. > > Here I would like to quote your vision on LPC 2022 [0] [1]. :) The reading part via iterators and access to kernel internals is fine, but to complete CRIB idea the restore side is necessary and for that bit I haven't heard a complete story that would be acceptable upstream. At LPC the proposal was to add kfuncs that will write into kernel data structures. That part won't fly, since I don't see how one can make such writing kfuncs safe. Restoring a socket, tcp connection, etc is not a trivial process. Just building a set of generic abstractions for reading is ok-ish, since they're generic and reusable, but the end to end story is necessary before we proceed.