On 07/11/16 16:05, Topi Miettinen wrote: > On 07/11/16 15:25, Serge E. Hallyn wrote: >> Quoting Topi Miettinen (toiwoton@xxxxxxxxx): >>> There are many basic ways to control processes, including capabilities, >>> cgroups and resource limits. However, there are far fewer ways to find >>> out useful values for the limits, except blind trial and error. >>> >>> Currently, there is no way to know which capabilities are actually used. >>> Even the source code is only implicit, in-depth knowledge of each >>> capability must be used when analyzing a program to judge which >>> capabilities the program will exercise. >>> >>> Generate an audit message at system call exit, when capabilities are used. >>> This can then be used to configure capability sets for services by a >>> software developer, maintainer or system administrator. >>> >>> Test case demonstrating basic capability monitoring with the new >>> message types 1330 and 1331 and how the cgroups are displayed (boot to >>> rdshell): >> >> Thanks, Topi, I'll find time this week to look this over in detail. >> >> How much chattier does this make the syslog/journald during a regular >> boot? I was thinking "this is audit, we can choose what messages >> will show up", but I guess that' sonly what auditd actually listens to, >> not what kernel emits? (sorry i've not looked at audit in a long >> time). Drat, that makes it seem like tracepoints would be better >> after all. But let's see how much it addes to the noise. > > For example "loadkeys" causes thousands of entries. :-( I'm checking how > to avoid audit message rate limiting, now some messages are lost. > > It's still too easy to drown the logs with noise. That could be limited > a lot by emitting a message only when the capability is used for the > first time. But the question is how to define where to start counting > (fork, exec, and/or setpcap?). I'm also not sure if that is the right > way to log, since the first use of a capability could be expected and an > innocent one, but then the 100th one could be malicious. > > It's also very complex and error-prone to collect a capability mask from > audit logs, which was my original goal. What if only a summary of capabilities was logged at task exit? That should make the log volume reasonable. -Topi > > -Topi > >> >>> BusyBox v1.22.1 (Debian 1:1.22.0-19) built-in shell (ash) >>> Enter 'help' for a list of built-in commands. >>> >>> (initramfs) cd /sys/fs >>> (initramfs) mount -t cgroup2 cgroup cgroup >>> [ 12.343152] audit_printk_skb: 5886 callbacks suppressed >>> [ 12.355214] audit: type=1300 audit(1468234317.100:518): arch=c000003e syscall=165 success=yes exit=0 a0=7fffe1e9ae2d a1=7fffe1e9ae34 a2=7fffe1e9ae25 a3=8000 items=0 ppid=469 pid=470 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=4294967295 comm="mount" exe="/bin/mount" key=(null) >>> [ 12.414853] audit: type=1327 audit(1468234317.100:518): proctitle=6D6F756E74002D74006367726F757032006367726F7570006367726F7570 >>> [ 12.438338] audit: type=1330 audit(1468234317.100:518): cap_used=0000000000200000 >>> [ 12.453893] audit: type=1331 audit(1468234317.100:518): cgroups=:/; >>> (initramfs) cd cgroup >>> (initramfs) mkdir test; cd test >>> [ 17.335625] audit: type=1300 audit(1468234322.092:519): arch=c000003e syscall=83 success=yes exit=0 a0=7ffddfd75e29 a1=1ff a2=0 a3=1e2 items=0 ppid=469 pid=471 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=4294967295 comm="mkdir" exe="/bin/mkdir" key=(null) >>> [ 17.392686] audit: type=1327 audit(1468234322.092:519): proctitle=6D6B6469720074657374 >>> [ 17.409404] audit: type=1330 audit(1468234322.092:519): cap_used=0000000000000002 >>> [ 17.425404] audit: type=1331 audit(1468234322.092:519): cgroups=:/; >>> (initramfs) echo $$ >cgroup.procs >>> (initramfs) mknod /dev/z_$$ c 1 2 >>> [ 28.385681] audit: type=1300 audit(1468234333.144:520): arch=c000003e syscall=133 success=yes exit=0 a0=7ffe16324e11 a1=21b6 a2=102 a3=5c9 items=0 ppid=469 pid=472 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=4294967295 comm="mknod" exe="/bin/mknod" key=(null) >>> [ 28.443674] audit: type=1327 audit(1468234333.144:520): proctitle=6D6B6E6F64002F6465762F7A5F343639006300310032 >>> [ 28.465888] audit: type=1330 audit(1468234333.144:520): cap_used=0000000008000000 >>> [ 28.482080] audit: type=1331 audit(1468234333.144:520): cgroups=:/test; >>> (initramfs) chown 1234 /dev/z_* >>> [ 34.772992] audit: type=1300 audit(1468234339.532:521): arch=c000003e syscall=92 success=yes exit=0 a0=7ffd0b563e17 a1=4d2 a2=0 a3=60a items=0 ppid=469 pid=473 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0 ses=4294967295 comm="chown" exe="/bin/chown" key=(null) >>> [ 34.828569] audit: type=1327 audit(1468234339.532:521): proctitle=63686F776E0031323334002F6465762F7A5F343639 >>> [ 34.848747] audit: type=1330 audit(1468234339.532:521): cap_used=0000000000000001 >>> [ 34.864404] audit: type=1331 audit(1468234339.532:521): cgroups=:/test; >>> >>> Signed-off-by: Topi Miettinen <toiwoton@xxxxxxxxx> >>> --- >>> include/linux/audit.h | 4 +++ >>> include/linux/cgroup.h | 2 ++ >>> include/uapi/linux/audit.h | 2 ++ >>> kernel/audit.c | 7 +++--- >>> kernel/audit.h | 1 + >>> kernel/auditsc.c | 28 ++++++++++++++++++++- >>> kernel/capability.c | 5 ++-- >>> kernel/cgroup.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++ >>> 8 files changed, 105 insertions(+), 6 deletions(-) >>> >>> diff --git a/include/linux/audit.h b/include/linux/audit.h >>> index e38e3fc..971cb2e 100644 >>> --- a/include/linux/audit.h >>> +++ b/include/linux/audit.h >>> @@ -438,6 +438,8 @@ static inline void audit_mmap_fd(int fd, int flags) >>> __audit_mmap_fd(fd, flags); >>> } >>> >>> +extern void audit_log_cap_use(int cap); >>> + >>> extern int audit_n_rules; >>> extern int audit_signals; >>> #else /* CONFIG_AUDITSYSCALL */ >>> @@ -545,6 +547,8 @@ static inline void audit_mmap_fd(int fd, int flags) >>> { } >>> static inline void audit_ptrace(struct task_struct *t) >>> { } >>> +static inline void audit_log_cap_use(int cap) >>> +{ } >>> #define audit_n_rules 0 >>> #define audit_signals 0 >>> #endif /* CONFIG_AUDITSYSCALL */ >>> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h >>> index a20320c..b5dc8aa 100644 >>> --- a/include/linux/cgroup.h >>> +++ b/include/linux/cgroup.h >>> @@ -100,6 +100,8 @@ char *task_cgroup_path(struct task_struct *task, char *buf, size_t buflen); >>> int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry); >>> int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns, >>> struct pid *pid, struct task_struct *tsk); >>> +struct audit_buffer; >>> +void audit_cgroup_list(struct audit_buffer *ab); >>> >>> void cgroup_fork(struct task_struct *p); >>> extern int cgroup_can_fork(struct task_struct *p); >>> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h >>> index d820aa9..c1ae016 100644 >>> --- a/include/uapi/linux/audit.h >>> +++ b/include/uapi/linux/audit.h >>> @@ -111,6 +111,8 @@ >>> #define AUDIT_PROCTITLE 1327 /* Proctitle emit event */ >>> #define AUDIT_FEATURE_CHANGE 1328 /* audit log listing feature changes */ >>> #define AUDIT_REPLACE 1329 /* Replace auditd if this packet unanswerd */ >>> +#define AUDIT_CAPABILITY 1330 /* Record showing capability use */ >>> +#define AUDIT_CGROUP 1331 /* Record showing cgroups */ >>> >>> #define AUDIT_AVC 1400 /* SE Linux avc denial or grant */ >>> #define AUDIT_SELINUX_ERR 1401 /* Internal SE Linux Errors */ >>> diff --git a/kernel/audit.c b/kernel/audit.c >>> index 8d528f9..98dd920 100644 >>> --- a/kernel/audit.c >>> +++ b/kernel/audit.c >>> @@ -54,6 +54,7 @@ >>> #include <linux/kthread.h> >>> #include <linux/kernel.h> >>> #include <linux/syscalls.h> >>> +#include <linux/cgroup.h> >>> >>> #include <linux/audit.h> >>> >>> @@ -1682,7 +1683,7 @@ void audit_log_cap(struct audit_buffer *ab, char *prefix, kernel_cap_t *cap) >>> { >>> int i; >>> >>> - audit_log_format(ab, " %s=", prefix); >>> + audit_log_format(ab, "%s=", prefix); >>> CAP_FOR_EACH_U32(i) { >>> audit_log_format(ab, "%08x", >>> cap->cap[CAP_LAST_U32 - i]); >>> @@ -1696,11 +1697,11 @@ static void audit_log_fcaps(struct audit_buffer *ab, struct audit_names *name) >>> int log = 0; >>> >>> if (!cap_isclear(*perm)) { >>> - audit_log_cap(ab, "cap_fp", perm); >>> + audit_log_cap(ab, " cap_fp", perm); >>> log = 1; >>> } >>> if (!cap_isclear(*inh)) { >>> - audit_log_cap(ab, "cap_fi", inh); >>> + audit_log_cap(ab, " cap_fi", inh); >>> log = 1; >>> } >>> >>> diff --git a/kernel/audit.h b/kernel/audit.h >>> index a492f4c..680e8b5 100644 >>> --- a/kernel/audit.h >>> +++ b/kernel/audit.h >>> @@ -202,6 +202,7 @@ struct audit_context { >>> }; >>> int fds[2]; >>> struct audit_proctitle proctitle; >>> + kernel_cap_t cap_used; >>> }; >>> >>> extern u32 audit_ever_enabled; >>> diff --git a/kernel/auditsc.c b/kernel/auditsc.c >>> index 2672d10..32c3813 100644 >>> --- a/kernel/auditsc.c >>> +++ b/kernel/auditsc.c >>> @@ -197,7 +197,6 @@ static int audit_match_filetype(struct audit_context *ctx, int val) >>> * References in it _are_ dropped - at the same time we free/drop aux stuff. >>> */ >>> >>> -#ifdef CONFIG_AUDIT_TREE >>> static void audit_set_auditable(struct audit_context *ctx) >>> { >>> if (!ctx->prio) { >>> @@ -206,6 +205,7 @@ static void audit_set_auditable(struct audit_context *ctx) >>> } >>> } >>> >>> +#ifdef CONFIG_AUDIT_TREE >>> static int put_tree_ref(struct audit_context *ctx, struct audit_chunk *chunk) >>> { >>> struct audit_tree_refs *p = ctx->trees; >>> @@ -1439,6 +1439,18 @@ static void audit_log_exit(struct audit_context *context, struct task_struct *ts >>> >>> audit_log_proctitle(tsk, context); >>> >>> + ab = audit_log_start(context, GFP_KERNEL, AUDIT_CAPABILITY); >>> + if (ab) { >>> + audit_log_cap(ab, "cap_used", &context->cap_used); >>> + audit_log_end(ab); >>> + } >>> + ab = audit_log_start(context, GFP_KERNEL, AUDIT_CGROUP); >>> + if (ab) { >>> + audit_log_format(ab, "cgroups="); >>> + audit_cgroup_list(ab); >>> + audit_log_end(ab); >>> + } >>> + >>> /* Send end of event record to help user space know we are finished */ >>> ab = audit_log_start(context, GFP_KERNEL, AUDIT_EOE); >>> if (ab) >>> @@ -2428,3 +2440,17 @@ struct list_head *audit_killed_trees(void) >>> return NULL; >>> return &ctx->killed_trees; >>> } >>> + >>> +void audit_log_cap_use(int cap) >>> +{ >>> + struct audit_context *context = current->audit_context; >>> + >>> + if (context) { >>> + cap_raise(context->cap_used, cap); >>> + audit_set_auditable(context); >>> + } else { >>> + audit_log(NULL, GFP_NOFS, AUDIT_CAPABILITY, >>> + "cap_used=%d pid=%d no audit_context", >>> + cap, task_pid_nr(current)); >>> + } >>> +} >>> diff --git a/kernel/capability.c b/kernel/capability.c >>> index 45432b5..d45d5b1 100644 >>> --- a/kernel/capability.c >>> +++ b/kernel/capability.c >>> @@ -366,8 +366,8 @@ bool has_capability_noaudit(struct task_struct *t, int cap) >>> * @ns: The usernamespace we want the capability in >>> * @cap: The capability to be tested for >>> * >>> - * Return true if the current task has the given superior capability currently >>> - * available for use, false if not. >>> + * Return true if the current task has the given superior capability >>> + * currently available for use, false if not. Write an audit message. >>> * >>> * This sets PF_SUPERPRIV on the task if the capability is available on the >>> * assumption that it's about to be used. >>> @@ -380,6 +380,7 @@ bool ns_capable(struct user_namespace *ns, int cap) >>> } >>> >>> if (security_capable(current_cred(), ns, cap) == 0) { >>> + audit_log_cap_use(cap); >>> current->flags |= PF_SUPERPRIV; >>> return true; >>> } >>> diff --git a/kernel/cgroup.c b/kernel/cgroup.c >>> index 75c0ff0..1931679 100644 >>> --- a/kernel/cgroup.c >>> +++ b/kernel/cgroup.c >>> @@ -63,6 +63,7 @@ >>> #include <linux/nsproxy.h> >>> #include <linux/proc_ns.h> >>> #include <net/sock.h> >>> +#include <linux/audit.h> >>> >>> /* >>> * pidlists linger the following amount before being destroyed. The goal >>> @@ -5789,6 +5790,67 @@ out: >>> return retval; >>> } >>> >>> +/* >>> + * audit_cgroup_list() >>> + * - Print task's cgroup paths with audit_log_format() >>> + * - Used for capability audit logging >>> + * - Otherwise very similar to proc_cgroup_show(). >>> + */ >>> +void audit_cgroup_list(struct audit_buffer *ab) >>> +{ >>> + char *buf, *path; >>> + struct cgroup_root *root; >>> + >>> + buf = kmalloc(PATH_MAX, GFP_NOFS); >>> + if (!buf) >>> + return; >>> + >>> + mutex_lock(&cgroup_mutex); >>> + spin_lock_irq(&css_set_lock); >>> + >>> + for_each_root(root) { >>> + struct cgroup_subsys *ss; >>> + struct cgroup *cgrp; >>> + int ssid, count = 0; >>> + >>> + if (root == &cgrp_dfl_root && !cgrp_dfl_visible) >>> + continue; >>> + >>> + if (root != &cgrp_dfl_root) >>> + for_each_subsys(ss, ssid) >>> + if (root->subsys_mask & (1 << ssid)) >>> + audit_log_format(ab, "%s%s", >>> + count++ ? "," : "", >>> + ss->legacy_name); >>> + if (strlen(root->name)) >>> + audit_log_format(ab, "%sname=%s", count ? "," : "", >>> + root->name); >>> + audit_log_format(ab, ":"); >>> + >>> + cgrp = task_cgroup_from_root(current, root); >>> + >>> + if (cgroup_on_dfl(cgrp) || !(current->flags & PF_EXITING)) { >>> + path = cgroup_path_ns_locked(cgrp, buf, PATH_MAX, >>> + current->nsproxy->cgroup_ns); >>> + if (!path) >>> + goto out_unlock; >>> + } else >>> + path = "/"; >>> + >>> + audit_log_format(ab, "%s", path); >>> + >>> + if (cgroup_on_dfl(cgrp) && cgroup_is_dead(cgrp)) >>> + audit_log_format(ab, " (deleted);"); >>> + else >>> + audit_log_format(ab, ";"); >>> + } >>> + >>> +out_unlock: >>> + spin_unlock_irq(&css_set_lock); >>> + mutex_unlock(&cgroup_mutex); >>> + kfree(buf); >>> +} >>> + >>> /* Display information about each subsystem and each hierarchy */ >>> static int proc_cgroupstats_show(struct seq_file *m, void *v) >>> { >>> -- >>> 2.8.1 > -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html