On Thu, Dec 5, 2019 at 9:35 AM Casey Schaufler <casey@xxxxxxxxxxxxxxxx> wrote: > > On 12/5/2019 9:05 AM, Alexey Budankov wrote: > > Hello Casey, > > > > On 05.12.2019 19:49, Casey Schaufler wrote: > >> On 12/5/2019 8:15 AM, Alexey Budankov wrote: > >>> Currently access to perf_events functionality [1] beyond the scope permitted > >>> by perf_event_paranoid [1] kernel setting is allowed to a privileged process > >>> [2] with CAP_SYS_ADMIN capability enabled in the process effective set [3]. > >>> > >>> This patch set introduces CAP_SYS_PERFMON capability devoted to secure performance > >>> monitoring activity so that CAP_SYS_PERFMON would assist CAP_SYS_ADMIN in its > >>> governing role for perf_events based performance monitoring of a system. > >>> > >>> CAP_SYS_PERFMON aims to harden system security and integrity when monitoring > >>> performance using perf_events subsystem by processes and Perf privileged users > >>> [2], thus decreasing attack surface that is available to CAP_SYS_ADMIN > >>> privileged processes [3]. > >> Are there use cases where you would need CAP_SYS_PERFMON where you > >> would not also need CAP_SYS_ADMIN? If you separate a new capability > > Actually, there are. Perf tool that has record, stat and top modes could run with > > CAP_SYS_PERFMON capability as mentioned below and provide system wide performance > > data. Currently for that to work the tool needs to be granted with CAP_SYS_ADMIN. > > The question isn't whether the tool could use the capability, it's whether > the tool would also need CAP_SYS_ADMIN to be useful. Are there existing > tools that could stop using CAP_SYS_ADMIN in favor of CAP_SYS_PERFMON? The answer is yes. I have recently been alerted to a problem with paranoid=2 and the popular rr debugger (https://rr-project.org/). This debugger uses several perf_events features, including profiling of PMU events and tracepoints (context-switches). With paranoid=2, it does not work anymore. We would need a privilege between regular user and admin to make it work again. Note that context switches tracepoint is only applied to self (not system-wide). > My bet is that any tool that does performance monitoring is going to need > CAP_SYS_ADMIN for other reasons. > > > > >> from CAP_SYS_ADMIN but always have to use CAP_SYS_ADMIN in conjunction > >> with the new capability it is all rather pointless. > >> > >> The scope you've defined for this CAP_SYS_PERFMON is very small. > >> Is there a larger set of privilege checks that might be applicable > >> for it? > > CAP_SYS_PERFMON could be applied broadly, though, this patch set enables record > > and stat mode use cases for system wide performance monitoring in kernel and > > user modes. > > The granularity of capabilities is something we have to watch > very carefully. Sure, CAP_SYS_ADMIN covers a lot of things, but > if we broke it up "properly" we'd have hundreds of capabilities. > If you want control that finely we have SELinux. > > > > > Thanks, > > Alexey > > > >> > >> > >>> CAP_SYS_PERFMON aims to take over CAP_SYS_ADMIN credentials related to > >>> performance monitoring functionality of perf_events and balance amount of > >>> CAP_SYS_ADMIN credentials in accordance with the recommendations provided in > >>> the man page for CAP_SYS_ADMIN [3]: "Note: this capability is overloaded; > >>> see Notes to kernel developers, below." > >>> > >>> For backward compatibility reasons performance monitoring functionality of > >>> perf_events subsystem remains available under CAP_SYS_ADMIN but its usage for > >>> secure performance monitoring use cases is discouraged with respect to the > >>> introduced CAP_SYS_PERFMON capability. > >>> > >>> In the suggested implementation CAP_SYS_PERFMON enables Perf privileged users > >>> [2] to conduct secure performance monitoring using perf_events in the scope > >>> of available online CPUs when executing code in kernel and user modes. > >>> > >>> Possible alternative solution to this capabilities balancing, system security > >>> hardening task could be to use the existing CAP_SYS_PTRACE capability to govern > >>> perf_events' performance monitoring functionality, since process debugging is > >>> similar to performance monitoring with respect to providing insights into > >>> process memory and execution details. However CAP_SYS_PTRACE still provides > >>> users with more credentials than are required for secure performance monitoring > >>> using perf_events subsystem and this excess is avoided by using the dedicated > >>> CAP_SYS_PERFMON capability. > >>> > >>> libcap library utilities [4], [5] and Perf tool can be used to apply > >>> CAP_SYS_PERFMON capability for secure performance monitoring beyond the scope > >>> permitted by system wide perf_event_paranoid kernel setting and below are the > >>> steps to evaluate the advancement suggested by the patch set: > >>> > >>> - patch, build and boot the kernel > >>> - patch, build Perf tool e.g. to /home/user/perf > >>> ... > >>> # git clone git://git.kernel.org/pub/scm/libs/libcap/libcap.git libcap > >>> # pushd libcap > >>> # patch libcap/include/uapi/linux/capabilities.h with [PATCH 1/3] > >>> # make > >>> # pushd progs > >>> # ./setcap "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf > >>> # ./setcap -v "cap_sys_perfmon,cap_sys_ptrace,cap_syslog=ep" /home/user/perf > >>> /home/user/perf: OK > >>> # ./getcap /home/user/perf > >>> /home/user/perf = cap_sys_ptrace,cap_syslog,cap_sys_perfmon+ep > >>> # echo 2 > /proc/sys/kernel/perf_event_paranoid > >>> # cat /proc/sys/kernel/perf_event_paranoid > >>> 2 > >>> ... > >>> $ /home/user/perf top > >>> ... works as expected ... > >>> $ cat /proc/`pidof perf`/status > >>> Name: perf > >>> Umask: 0002 > >>> State: S (sleeping) > >>> Tgid: 2958 > >>> Ngid: 0 > >>> Pid: 2958 > >>> PPid: 9847 > >>> TracerPid: 0 > >>> Uid: 500 500 500 500 > >>> Gid: 500 500 500 500 > >>> FDSize: 256 > >>> ... > >>> CapInh: 0000000000000000 > >>> CapPrm: 0000004400080000 > >>> CapEff: 0000004400080000 => 01000100 00000000 00001000 00000000 00000000 > >>> cap_sys_perfmon,cap_sys_ptrace,cap_syslog > >>> CapBnd: 0000007fffffffff > >>> CapAmb: 0000000000000000 > >>> NoNewPrivs: 0 > >>> Seccomp: 0 > >>> Speculation_Store_Bypass: thread vulnerable > >>> Cpus_allowed: ff > >>> Cpus_allowed_list: 0-7 > >>> ... > >>> > >>> Usage of cap_sys_perfmon effectively avoids unused credentials excess: > >>> - with cap_sys_admin: > >>> CapEff: 0000007fffffffff => 01111111 11111111 11111111 11111111 11111111 > >>> - with cap_sys_perfmon: > >>> CapEff: 0000004400080000 => 01000100 00000000 00001000 00000000 00000000 > >>> 38 34 19 > >>> sys_perfmon syslog sys_ptrace > >>> > >>> The patch set is for tip perf/core repository: > >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip perf/core > >>> tip sha1: ceb9e77324fa661b1001a0ae66f061b5fcb4e4e6 > >>> > >>> [1] http://man7.org/linux/man-pages/man2/perf_event_open.2.html > >>> [2] https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html > >>> [3] http://man7.org/linux/man-pages/man7/capabilities.7.html > >>> [4] http://man7.org/linux/man-pages/man8/setcap.8.html > >>> [5] https://git.kernel.org/pub/scm/libs/libcap/libcap.git > >>> [6] https://sites.google.com/site/fullycapable/, posix_1003.1e-990310.pdf > >>> > >>> --- > >>> Alexey Budankov (3): > >>> capabilities: introduce CAP_SYS_PERFMON to kernel and user space > >>> perf/core: apply CAP_SYS_PERFMON to CPUs and kernel monitoring > >>> perf tool: extend Perf tool with CAP_SYS_PERFMON support > >>> > >>> include/linux/perf_event.h | 6 ++++-- > >>> include/uapi/linux/capability.h | 10 +++++++++- > >>> security/selinux/include/classmap.h | 4 ++-- > >>> tools/perf/design.txt | 3 ++- > >>> tools/perf/util/cap.h | 4 ++++ > >>> tools/perf/util/evsel.c | 10 +++++----- > >>> tools/perf/util/util.c | 15 +++++++++++++-- > >>> 7 files changed, 39 insertions(+), 13 deletions(-) > >>> > >> >