Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Sat, 27 Aug 2016 11:06:44 -0700

On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> >>
> >>>
> >>> - I don't think such 'for' loop can scale. The solution needs to work
> >>> with thousands of containers and thousands of cgroups.
> >>> In the patch 06/10 the proposal is to use 'current' as holder of
> >>> the programs:
> >>> +   for (prog = current->seccomp.landlock_prog;
> >>> +                   prog; prog = prog->prev) {
> >>> +           if (prog->filter == landlock_ret->filter) {
> >>> +                   cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
> >>> +                   break;
> >>> +           }
> >>> +   }
> >>> imo that's the root of scalability issue.
> >>> I think to be able to scale the bpf programs have to be attached to
> >>> cgroups instead of tasks.
> >>> That would be very different api. seccomp doesn't need to be touched.
> >>> But that is the only way I see to be able to scale.
> >>
> >> Landlock is inspired from seccomp which also use a BPF program per
> >> thread. For seccomp, each BPF programs are executed for each syscall.
> >> For Landlock, some BPF programs are executed for some LSM hooks. I don't
> >> see why it is a scale issue for Landlock comparing to seccomp. I also
> >> don't see why storing the BPF program list pointer in the cgroup struct
> >> instead of the task struct change a lot here. The BPF programs execution
> >> will be the same anyway (for each LSM hook). Kees should probably have a
> >> better opinion on this.
> > 
> > seccomp has its own issues and copying them doesn't make this lsm any better.
> > Like seccomp bpf programs are all gigantic switch statement that looks
> > for interesting syscall numbers. All syscalls of a task are paying
> > non-trivial seccomp penalty due to such design. If bpf was attached per
> > syscall it would have been much cheaper. Of course doing it this way
> > for seccomp is not easy, but for lsm such facility is already there.
> > Blank call of a single bpf prog for all lsm hooks is unnecessary
> > overhead that can and should be avoided.
> 
> It's probably a misunderstanding. Contrary to seccomp which run all the
> thread's BPF programs for any syscall, Landlock only run eBPF programs
> for the triggered LSM hooks, if their type match. Indeed, thanks to the
> multiple eBPF program types and contrary to seccomp, Landlock only run
> an eBPF program when needed. Landlock will have almost no performance
> overhead if the syscalls do not trigger the watched LSM hooks for the
> current process.

that's not what I see in the patch 06/10:
all lsm_hooks in 'static struct security_hook_list landlock_hooks'
(which eventually means all lsm hooks) will call
static inline int landlock_hook_##NAME
which will call landlock_run_prog()
which does:
+ for (landlock_ret = current->seccomp.landlock_ret;
+      landlock_ret; landlock_ret = landlock_ret->prev) {
+    if (landlock_ret->triggered) {
+       ctx.cookie = landlock_ret->cookie;
+       for (prog = current->seccomp.landlock_prog;
+            prog; prog = prog->prev) {
+               if (prog->filter == landlock_ret->filter) {
+                       cur_ret = BPF_PROG_RUN(prog->prog, (void *)&ctx);
+                       break;
+               }
+       }

that is unacceptable overhead and not a scalable design.
It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale
as soon as more lsm hooks are added.

> As said above, Landlock will not run an eBPF programs when not strictly
> needed. Attaching to a cgroup will have the same performance impact as
> attaching to a process hierarchy.

Having a prog per cgroup per lsm_hook is the only scalable way I
could come up with. If you see another way, please propose.
current->seccomp.landlock_prog is not the answer.

--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html