On 07/11/16 15:25, Serge E. Hallyn wrote:
Quoting Topi Miettinen (toiwoton(a)gmail.com):
> There are many basic ways to control processes, including capabilities,
> cgroups and resource limits. However, there are far fewer ways to find
> out useful values for the limits, except blind trial and error.
>
> Currently, there is no way to know which capabilities are actually used.
> Even the source code is only implicit, in-depth knowledge of each
> capability must be used when analyzing a program to judge which
> capabilities the program will exercise.
>
> Generate an audit message at system call exit, when capabilities are used.
> This can then be used to configure capability sets for services by a
> software developer, maintainer or system administrator.
>
> Test case demonstrating basic capability monitoring with the new
> message types 1330 and 1331 and how the cgroups are displayed (boot to
> rdshell):
Thanks, Topi, I'll find time this week to look this over in detail.
How much chattier does this make the syslog/journald during a regular
boot? I was thinking "this is audit, we can choose what messages
will show up", but I guess that' sonly what auditd actually listens to,
not what kernel emits? (sorry i've not looked at audit in a long
time). Drat, that makes it seem like tracepoints would be better
after all. But let's see how much it addes to the noise.
For example "loadkeys" causes thousands of entries. :-( I'm checking how
to avoid audit message rate limiting, now some messages are lost.
It's still too easy to drown the logs with noise. That could be limited
a lot by emitting a message only when the capability is used for the
first time. But the question is how to define where to start counting
(fork, exec, and/or setpcap?). I'm also not sure if that is the right
way to log, since the first use of a capability could be expected and an
innocent one, but then the 100th one could be malicious.
It's also very complex and error-prone to collect a capability mask from
audit logs, which was my original goal.
-Topi
> BusyBox v1.22.1 (Debian 1:1.22.0-19) built-in shell (ash)
> Enter 'help' for a list of built-in commands.
>
> (initramfs) cd /sys/fs
> (initramfs) mount -t cgroup2 cgroup cgroup
> [ 12.343152] audit_printk_skb: 5886 callbacks suppressed
> [ 12.355214] audit: type=1300 audit(1468234317.100:518): arch=c000003e syscall=165
success=yes exit=0 a0=7fffe1e9ae2d a1=7fffe1e9ae34 a2=7fffe1e9ae25 a3=8000 items=0
ppid=469 pid=470 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0
tty=ttyS0 ses=4294967295 comm="mount" exe="/bin/mount" key=(null)
> [ 12.414853] audit: type=1327 audit(1468234317.100:518):
proctitle=6D6F756E74002D74006367726F757032006367726F7570006367726F7570
> [ 12.438338] audit: type=1330 audit(1468234317.100:518): cap_used=0000000000200000
> [ 12.453893] audit: type=1331 audit(1468234317.100:518): cgroups=:/;
> (initramfs) cd cgroup
> (initramfs) mkdir test; cd test
> [ 17.335625] audit: type=1300 audit(1468234322.092:519): arch=c000003e syscall=83
success=yes exit=0 a0=7ffddfd75e29 a1=1ff a2=0 a3=1e2 items=0 ppid=469 pid=471
auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0
ses=4294967295 comm="mkdir" exe="/bin/mkdir" key=(null)
> [ 17.392686] audit: type=1327 audit(1468234322.092:519):
proctitle=6D6B6469720074657374
> [ 17.409404] audit: type=1330 audit(1468234322.092:519): cap_used=0000000000000002
> [ 17.425404] audit: type=1331 audit(1468234322.092:519): cgroups=:/;
> (initramfs) echo $$ >cgroup.procs
> (initramfs) mknod /dev/z_$$ c 1 2
> [ 28.385681] audit: type=1300 audit(1468234333.144:520): arch=c000003e syscall=133
success=yes exit=0 a0=7ffe16324e11 a1=21b6 a2=102 a3=5c9 items=0 ppid=469 pid=472
auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0
ses=4294967295 comm="mknod" exe="/bin/mknod" key=(null)
> [ 28.443674] audit: type=1327 audit(1468234333.144:520):
proctitle=6D6B6E6F64002F6465762F7A5F343639006300310032
> [ 28.465888] audit: type=1330 audit(1468234333.144:520): cap_used=0000000008000000
> [ 28.482080] audit: type=1331 audit(1468234333.144:520): cgroups=:/test;
> (initramfs) chown 1234 /dev/z_*
> [ 34.772992] audit: type=1300 audit(1468234339.532:521): arch=c000003e syscall=92
success=yes exit=0 a0=7ffd0b563e17 a1=4d2 a2=0 a3=60a items=0 ppid=469 pid=473
auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=ttyS0
ses=4294967295 comm="chown" exe="/bin/chown" key=(null)
> [ 34.828569] audit: type=1327 audit(1468234339.532:521):
proctitle=63686F776E0031323334002F6465762F7A5F343639
> [ 34.848747] audit: type=1330 audit(1468234339.532:521): cap_used=0000000000000001
> [ 34.864404] audit: type=1331 audit(1468234339.532:521): cgroups=:/test;
>
> Signed-off-by: Topi Miettinen <toiwoton(a)gmail.com>
> ---
> include/linux/audit.h | 4 +++
> include/linux/cgroup.h | 2 ++
> include/uapi/linux/audit.h | 2 ++
> kernel/audit.c | 7 +++---
> kernel/audit.h | 1 +
> kernel/auditsc.c | 28 ++++++++++++++++++++-
> kernel/capability.c | 5 ++--
> kernel/cgroup.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++
> 8 files changed, 105 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index e38e3fc..971cb2e 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -438,6 +438,8 @@ static inline void audit_mmap_fd(int fd, int flags)
> __audit_mmap_fd(fd, flags);
> }
>
> +extern void audit_log_cap_use(int cap);
> +
> extern int audit_n_rules;
> extern int audit_signals;
> #else /* CONFIG_AUDITSYSCALL */
> @@ -545,6 +547,8 @@ static inline void audit_mmap_fd(int fd, int flags)
> { }
> static inline void audit_ptrace(struct task_struct *t)
> { }
> +static inline void audit_log_cap_use(int cap)
> +{ }
> #define audit_n_rules 0
> #define audit_signals 0
> #endif /* CONFIG_AUDITSYSCALL */
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index a20320c..b5dc8aa 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -100,6 +100,8 @@ char *task_cgroup_path(struct task_struct *task, char *buf,
size_t buflen);
> int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry);
> int proc_cgroup_show(struct seq_file *m, struct pid_namespace *ns,
> struct pid *pid, struct task_struct *tsk);
> +struct audit_buffer;
> +void audit_cgroup_list(struct audit_buffer *ab);
>
> void cgroup_fork(struct task_struct *p);
> extern int cgroup_can_fork(struct task_struct *p);
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index d820aa9..c1ae016 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -111,6 +111,8 @@
> #define AUDIT_PROCTITLE 1327 /* Proctitle emit event */
> #define AUDIT_FEATURE_CHANGE 1328 /* audit log listing feature changes */
> #define AUDIT_REPLACE 1329 /* Replace auditd if this packet unanswerd */
> +#define AUDIT_CAPABILITY 1330 /* Record showing capability use */
> +#define AUDIT_CGROUP 1331 /* Record showing cgroups */
>
> #define AUDIT_AVC 1400 /* SE Linux avc denial or grant */
> #define AUDIT_SELINUX_ERR 1401 /* Internal SE Linux Errors */
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 8d528f9..98dd920 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -54,6 +54,7 @@
> #include <linux/kthread.h>
> #include <linux/kernel.h>
> #include <linux/syscalls.h>
> +#include <linux/cgroup.h>
>
> #include <linux/audit.h>
>
> @@ -1682,7 +1683,7 @@ void audit_log_cap(struct audit_buffer *ab, char *prefix,
kernel_cap_t *cap)
> {
> int i;
>
> - audit_log_format(ab, " %s=", prefix);
> + audit_log_format(ab, "%s=", prefix);
> CAP_FOR_EACH_U32(i) {
> audit_log_format(ab, "%08x",
> cap->cap[CAP_LAST_U32 - i]);
> @@ -1696,11 +1697,11 @@ static void audit_log_fcaps(struct audit_buffer *ab, struct
audit_names *name)
> int log = 0;
>
> if (!cap_isclear(*perm)) {
> - audit_log_cap(ab, "cap_fp", perm);
> + audit_log_cap(ab, " cap_fp", perm);
> log = 1;
> }
> if (!cap_isclear(*inh)) {
> - audit_log_cap(ab, "cap_fi", inh);
> + audit_log_cap(ab, " cap_fi", inh);
> log = 1;
> }
>
> diff --git a/kernel/audit.h b/kernel/audit.h
> index a492f4c..680e8b5 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -202,6 +202,7 @@ struct audit_context {
> };
> int fds[2];
> struct audit_proctitle proctitle;
> + kernel_cap_t cap_used;
> };
>
> extern u32 audit_ever_enabled;
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 2672d10..32c3813 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -197,7 +197,6 @@ static int audit_match_filetype(struct audit_context *ctx, int
val)
> * References in it _are_ dropped - at the same time we free/drop aux stuff.
> */
>
> -#ifdef CONFIG_AUDIT_TREE
> static void audit_set_auditable(struct audit_context *ctx)
> {
> if (!ctx->prio) {
> @@ -206,6 +205,7 @@ static void audit_set_auditable(struct audit_context *ctx)
> }
> }
>
> +#ifdef CONFIG_AUDIT_TREE
> static int put_tree_ref(struct audit_context *ctx, struct audit_chunk *chunk)
> {
> struct audit_tree_refs *p = ctx->trees;
> @@ -1439,6 +1439,18 @@ static void audit_log_exit(struct audit_context *context,
struct task_struct *ts
>
> audit_log_proctitle(tsk, context);
>
> + ab = audit_log_start(context, GFP_KERNEL, AUDIT_CAPABILITY);
> + if (ab) {
> + audit_log_cap(ab, "cap_used", &context->cap_used);
> + audit_log_end(ab);
> + }
> + ab = audit_log_start(context, GFP_KERNEL, AUDIT_CGROUP);
> + if (ab) {
> + audit_log_format(ab, "cgroups=");
> + audit_cgroup_list(ab);
> + audit_log_end(ab);
> + }
> +
> /* Send end of event record to help user space know we are finished */
> ab = audit_log_start(context, GFP_KERNEL, AUDIT_EOE);
> if (ab)
> @@ -2428,3 +2440,17 @@ struct list_head *audit_killed_trees(void)
> return NULL;
> return &ctx->killed_trees;
> }
> +
> +void audit_log_cap_use(int cap)
> +{
> + struct audit_context *context = current->audit_context;
> +
> + if (context) {
> + cap_raise(context->cap_used, cap);
> + audit_set_auditable(context);
> + } else {
> + audit_log(NULL, GFP_NOFS, AUDIT_CAPABILITY,
> + "cap_used=%d pid=%d no audit_context",
> + cap, task_pid_nr(current));
> + }
> +}
> diff --git a/kernel/capability.c b/kernel/capability.c
> index 45432b5..d45d5b1 100644
> --- a/kernel/capability.c
> +++ b/kernel/capability.c
> @@ -366,8 +366,8 @@ bool has_capability_noaudit(struct task_struct *t, int cap)
> * @ns: The usernamespace we want the capability in
> * @cap: The capability to be tested for
> *
> - * Return true if the current task has the given superior capability currently
> - * available for use, false if not.
> + * Return true if the current task has the given superior capability
> + * currently available for use, false if not. Write an audit message.
> *
> * This sets PF_SUPERPRIV on the task if the capability is available on the
> * assumption that it's about to be used.
> @@ -380,6 +380,7 @@ bool ns_capable(struct user_namespace *ns, int cap)
> }
>
> if (security_capable(current_cred(), ns, cap) == 0) {
> + audit_log_cap_use(cap);
> current->flags |= PF_SUPERPRIV;
> return true;
> }
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 75c0ff0..1931679 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -63,6 +63,7 @@
> #include <linux/nsproxy.h>
> #include <linux/proc_ns.h>
> #include <net/sock.h>
> +#include <linux/audit.h>
>
> /*
> * pidlists linger the following amount before being destroyed. The goal
> @@ -5789,6 +5790,67 @@ out:
> return retval;
> }
>
> +/*
> + * audit_cgroup_list()
> + * - Print task's cgroup paths with audit_log_format()
> + * - Used for capability audit logging
> + * - Otherwise very similar to proc_cgroup_show().
> + */
> +void audit_cgroup_list(struct audit_buffer *ab)
> +{
> + char *buf, *path;
> + struct cgroup_root *root;
> +
> + buf = kmalloc(PATH_MAX, GFP_NOFS);
> + if (!buf)
> + return;
> +
> + mutex_lock(&cgroup_mutex);
> + spin_lock_irq(&css_set_lock);
> +
> + for_each_root(root) {
> + struct cgroup_subsys *ss;
> + struct cgroup *cgrp;
> + int ssid, count = 0;
> +
> + if (root == &cgrp_dfl_root && !cgrp_dfl_visible)
> + continue;
> +
> + if (root != &cgrp_dfl_root)
> + for_each_subsys(ss, ssid)
> + if (root->subsys_mask & (1 << ssid))
> + audit_log_format(ab, "%s%s",
> + count++ ? "," : "",
> + ss->legacy_name);
> + if (strlen(root->name))
> + audit_log_format(ab, "%sname=%s", count ? "," :
"",
> + root->name);
> + audit_log_format(ab, ":");
> +
> + cgrp = task_cgroup_from_root(current, root);
> +
> + if (cgroup_on_dfl(cgrp) || !(current->flags & PF_EXITING)) {
> + path = cgroup_path_ns_locked(cgrp, buf, PATH_MAX,
> + current->nsproxy->cgroup_ns);
> + if (!path)
> + goto out_unlock;
> + } else
> + path = "/";
> +
> + audit_log_format(ab, "%s", path);
> +
> + if (cgroup_on_dfl(cgrp) && cgroup_is_dead(cgrp))
> + audit_log_format(ab, " (deleted);");
> + else
> + audit_log_format(ab, ";");
> + }
> +
> +out_unlock:
> + spin_unlock_irq(&css_set_lock);
> + mutex_unlock(&cgroup_mutex);
> + kfree(buf);
> +}
> +
> /* Display information about each subsystem and each hierarchy */
> static int proc_cgroupstats_show(struct seq_file *m, void *v)
> {
> --
> 2.8.1