Re: [PATCH bpf-next 1/3] bpf: Parameterize task iterators.

Hao Luo <haoluo@xxxxxxxxxx> · Thu, 28 Jul 2022 12:11:08 -0700

Hi Kumar,

On Thu, Jul 28, 2022 at 10:53 AM Kumar Kartikeya Dwivedi
<memxor@xxxxxxxxx> wrote:
>
> On Thu, 28 Jul 2022 at 19:08, Yonghong Song <yhs@xxxxxx> wrote:
> > > [...]
> >
> > There is one problem here. The current pidfd_open syscall
> > only supports thread-group leader, i.e., main thread, i.e.,
> > it won't support any non-main-thread tid's. Yes, thread-group
> > leader and other threads should share the same vma and files
> > in most of times, but it still possible different threads
> > in the same process may have different files which is why
> > in current task_iter.c we have:
> >                  *tid = pid_nr_ns(pid, ns);
> >                  task = get_pid_task(pid, PIDTYPE_PID);
> >                  if (!task) {
> >                          ++*tid;
> >                          goto retry;
> >                  } else if (skip_if_dup_files &&
> > !thread_group_leader(task) &&
> >                             task->files == task->group_leader->files) {
> >                          put_task_struct(task);
> >                          task = NULL;
> >                          ++*tid;
> >                          goto retry;
> >                  }
> >
> >
> > Each thread (tid) will have some fields different from
> > thread-group leader (tgid), e.g., comm and most (if not all)
> > scheduling related fields.
> >
> > So it would be good to support for each tid from the start
> > as it is not clear when pidfd_open will support non
> > thread-group leader.
>
> I think this is just a missing feature, not a design limitation. If we
> just disable thread pifdfd from waitid and pidfd_send_signal, I think
> it is ok to use it elsewhere.
>
> >
> > If it worries wrap around, a reference to the task
> > can be held when tid passed to the kernel at link
> > create time. This is similar to pid is passed to
> > the kernel at pidfd_open syscall. But in practice,
> > I think taking the reference during read() should
> > also fine. The race always exist anyway.
> >
> > Kumar, could you give more details about security
> > concerns? I am not sure about the tight relationship
> > between bpf_iter and security. bpf_iter just for
> > iterating kernel data structures.
> >
>
> There is no security 'concern' per se. But having an fd which is used
> to set up the iterator just gives a chance to a BPF LSM to easily
> isolate permissions to attach the iterator to a task represented by
> that fd. i.e. the fd is an object to which this permission can be
> bound, the fd can be passed around (to share the same permission with
> others but it is limited to the task corresponding to it), etc. The
> permission management is even easier and faster once you have a file
> local storage map (which I plan to send soon).
>

I like the idea of a file local storage map. I have several questions
in mind, but don't want to digress from the discussion under
Kui-Feng's patchset. It probably will be clear when seeing your
change. Please cc me when you send it out. thanks.

> So you could have two pidfds, one which allows the process to attach
> the iter to it, and one which doesn't, without relying on the task's
> capabilities, the checks can become more fine grained, and the
> original task can even drop its capabilities (because they are bound
> to the fd instead), which allows privilege separation.
>
> It becomes a bit difficult when kernel APIs take IDs because then you
> don't have any subject (other than the task) to draw the permissions
> from.
>
> But anyway, all of this was just a suggestion (which is why I
> solicited opinions in the original reply). Feel free to drop/ignore if
> it is too much of a hassle to support (i.e. it is understandable you
> wouldn't want to spend time extending pidfd_open for this).

On another thread, I was having a discussion with Tejun on FD vs ID
for cgroup_iter. I am in favor of ID in general, because it's
stateless and matches the info reported by bpf_link_info. This is nice
from the userspace's perspective.

Hao