Re: [PATCH v3 seccomp 5/5] seccomp/cache: Report cache data through /proc/pid/seccomp_cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 30, 2020 at 5:20 PM YiFei Zhu <zhuyifei1999@xxxxxxxxx> wrote:
> Currently the kernel does not provide an infrastructure to translate
> architecture numbers to a human-readable name. Translating syscall
> numbers to syscall names is possible through FTRACE_SYSCALL
> infrastructure but it does not provide support for compat syscalls.
>
> This will create a file for each PID as /proc/pid/seccomp_cache.
> The file will be empty when no seccomp filters are loaded, or be
> in the format of:
> <arch name> <decimal syscall number> <ALLOW | FILTER>
> where ALLOW means the cache is guaranteed to allow the syscall,
> and filter means the cache will pass the syscall to the BPF filter.
>
> For the docker default profile on x86_64 it looks like:
> x86_64 0 ALLOW
> x86_64 1 ALLOW
> x86_64 2 ALLOW
> x86_64 3 ALLOW
> [...]
> x86_64 132 ALLOW
> x86_64 133 ALLOW
> x86_64 134 FILTER
> x86_64 135 FILTER
> x86_64 136 FILTER
> x86_64 137 ALLOW
> x86_64 138 ALLOW
> x86_64 139 FILTER
> x86_64 140 ALLOW
> x86_64 141 ALLOW
> [...]

Oooh, neat! :) Thanks!

> Suggested-by: Jann Horn <jannh@xxxxxxxxxx>
> Link: https://lore.kernel.org/lkml/CAG48ez3Ofqp4crXGksLmZY6=fGrF_tWyUCg7PBkAetvbbOPeOA@xxxxxxxxxxxxxx/
> Signed-off-by: YiFei Zhu <yifeifz2@xxxxxxxxxxxx>
> ---
>  arch/Kconfig                   | 15 +++++++++++
>  arch/x86/include/asm/seccomp.h |  3 +++
>  fs/proc/base.c                 |  3 +++
>  include/linux/seccomp.h        |  5 ++++
>  kernel/seccomp.c               | 46 ++++++++++++++++++++++++++++++++++
>  5 files changed, 72 insertions(+)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index ca867b2a5d71..b840cadcc882 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -478,6 +478,7 @@ config HAVE_ARCH_SECCOMP_CACHE_NR_ONLY
>           - all the requirements for HAVE_ARCH_SECCOMP_FILTER
>           - SECCOMP_ARCH_DEFAULT
>           - SECCOMP_ARCH_DEFAULT_NR
> +         - SECCOMP_ARCH_DEFAULT_NAME
>
>  config SECCOMP
>         prompt "Enable seccomp to safely execute untrusted bytecode"
> @@ -532,6 +533,20 @@ config SECCOMP_CACHE_NR_ONLY
>
>  endchoice
>
> +config DEBUG_SECCOMP_CACHE
> +       bool "Show seccomp filter cache status in /proc/pid/seccomp_cache"
> +       depends on SECCOMP_CACHE_NR_ONLY
> +       depends on PROC_FS
> +       help
> +         This is enables /proc/pid/seccomp_cache interface to monitor

nit: s/is enables/enables/

> +         seccomp cache data. The file format is subject to change. Reading
> +         the file requires CAP_SYS_ADMIN.
> +
> +         This option is for debugging only. Enabling present the risk that
> +         an adversary may be able to infer the seccomp filter logic.
> +
> +         If unsure, say N.
> +
[...]
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
[...]
> +int proc_pid_seccomp_cache(struct seq_file *m, struct pid_namespace *ns,
> +                          struct pid *pid, struct task_struct *task)
> +{
> +       struct seccomp_filter *f;
> +
> +       /*
> +        * We don't want some sandboxed process know what their seccomp
> +        * filters consist of.
> +        */
> +       if (!file_ns_capable(m->file, &init_user_ns, CAP_SYS_ADMIN))
> +               return -EACCES;
> +
> +       f = READ_ONCE(task->seccomp.filter);
> +       if (!f)
> +               return 0;

Hmm, this won't work, because the task could be exiting, and seccomp
filters are detached in release_task() (using
seccomp_filter_release()). And at the moment, seccomp_filter_release()
just locklessly NULLs out the tsk->seccomp.filter pointer and drops
the reference.

The locking here is kind of gross, but basically I think you can
change this code to use lock_task_sighand() / unlock_task_sighand()
(see the other examples in fs/proc/base.c), and bail out if
lock_task_sighand() returns NULL. And in seccomp_filter_release(), add
something like this:

/* We are effectively holding the siglock by not having any sighand. */
WARN_ON(tsk->sighand != NULL);

> +#ifdef SECCOMP_ARCH_DEFAULT
> +       proc_pid_seccomp_cache_arch(m, SECCOMP_ARCH_DEFAULT_NAME,
> +                                   f->cache.syscall_allow_default,
> +                                   SECCOMP_ARCH_DEFAULT_NR);
> +#endif /* SECCOMP_ARCH_DEFAULT */
> +
> +#ifdef SECCOMP_ARCH_COMPAT
> +       proc_pid_seccomp_cache_arch(m, SECCOMP_ARCH_COMPAT_NAME,
> +                                   f->cache.syscall_allow_compat,
> +                                   SECCOMP_ARCH_COMPAT_NR);
> +#endif /* SECCOMP_ARCH_COMPAT */
> +       return 0;
> +}
> +#endif /* CONFIG_DEBUG_SECCOMP_CACHE */
> --
> 2.28.0
>
_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/containers



[Index of Archives]     [Cgroups]     [Netdev]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux