On Tue, Dec 15, 2020 at 03:36:59PM -0800, Song Liu wrote: > +/* > + * Key information from vm_area_struct. We need this because we cannot > + * assume the vm_area_struct is still valid after each iteration. > + */ > +struct __vm_area_struct { > + __u64 start; > + __u64 end; > + __u64 flags; > + __u64 pgoff; > +}; Where it's inside .c or exposed in uapi/bpf.h it will become uapi if it's used this way. Let's switch to arbitrary BTF-based access instead. > +static struct __vm_area_struct * > +task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info) > +{ > + struct pid_namespace *ns = info->common.ns; > + struct task_struct *curr_task; > + struct vm_area_struct *vma; > + u32 curr_tid = info->tid; > + bool new_task = false; > + > + /* If this function returns a non-NULL __vm_area_struct, it held > + * a reference to the task_struct. If info->file is non-NULL, it > + * also holds a reference to the file. If this function returns > + * NULL, it does not hold any reference. > + */ > +again: > + if (info->task) { > + curr_task = info->task; > + } else { > + curr_task = task_seq_get_next(ns, &curr_tid, true); > + if (!curr_task) { > + info->task = NULL; > + info->tid++; > + return NULL; > + } > + > + if (curr_tid != info->tid) { > + info->tid = curr_tid; > + new_task = true; > + } > + > + if (!curr_task->mm) > + goto next_task; > + info->task = curr_task; > + } > + > + mmap_read_lock(curr_task->mm); That will hurt. /proc readers do that and it causes all sorts of production issues. We cannot take this lock. There is no need to take it. Switch the whole thing to probe_read style walking. And reimplement find_vma with probe_read while omitting vmacache. It will be short rbtree walk. bpf prog doesn't need to see a stable struct. It will read it through ptr_to_btf_id which will use probe_read equivalent underneath.