Re: [RFC PATCH 4/9] audit: add filtering for io_uring records

Richard Guy Briggs <rgb@xxxxxxxxxx> · Fri, 28 May 2021 18:35:44 -0400

On 2021-05-21 17:50, Paul Moore wrote:
> WARNING - This is a work in progress and should not be merged
> anywhere important.  It is almost surely not complete, and while it
> probably compiles it likely hasn't been booted and will do terrible
> things.  You have been warned.
> 
> This patch adds basic audit io_uring filtering, using as much of the
> existing audit filtering infrastructure as possible.  In order to do
> this we reuse the audit filter rule's syscall mask for the io_uring
> operation and we create a new filter for io_uring operations as
> AUDIT_FILTER_URING_EXIT/audit_filter_list[7].
> 
> <TODO - provide some additional guidance for the userspace tools>
> 
> Signed-off-by: Paul Moore <paul@xxxxxxxxxxxxxx>
> ---
>  include/uapi/linux/audit.h |    3 +-
>  kernel/auditfilter.c       |    4 ++-
>  kernel/auditsc.c           |   65 ++++++++++++++++++++++++++++++++++----------
>  3 files changed, 55 insertions(+), 17 deletions(-)
> 
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index b26e0c435e8b..621eb3c1076e 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -167,8 +167,9 @@
>  #define AUDIT_FILTER_EXCLUDE	0x05	/* Apply rule before record creation */
>  #define AUDIT_FILTER_TYPE	AUDIT_FILTER_EXCLUDE /* obsolete misleading naming */
>  #define AUDIT_FILTER_FS		0x06	/* Apply rule at __audit_inode_child */
> +#define AUDIT_FILTER_URING_EXIT	0x07	/* Apply rule at io_uring op exit */
>  
> -#define AUDIT_NR_FILTERS	7
> +#define AUDIT_NR_FILTERS	8
>  
>  #define AUDIT_FILTER_PREPEND	0x10	/* Prepend to front of list */
>  
> diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> index db2c6b59dfc3..c21119c00504 100644
> --- a/kernel/auditfilter.c
> +++ b/kernel/auditfilter.c
> @@ -44,7 +44,8 @@ struct list_head audit_filter_list[AUDIT_NR_FILTERS] = {
>  	LIST_HEAD_INIT(audit_filter_list[4]),
>  	LIST_HEAD_INIT(audit_filter_list[5]),
>  	LIST_HEAD_INIT(audit_filter_list[6]),
> -#if AUDIT_NR_FILTERS != 7
> +	LIST_HEAD_INIT(audit_filter_list[7]),
> +#if AUDIT_NR_FILTERS != 8
>  #error Fix audit_filter_list initialiser
>  #endif
>  };
> @@ -56,6 +57,7 @@ static struct list_head audit_rules_list[AUDIT_NR_FILTERS] = {
>  	LIST_HEAD_INIT(audit_rules_list[4]),
>  	LIST_HEAD_INIT(audit_rules_list[5]),
>  	LIST_HEAD_INIT(audit_rules_list[6]),
> +	LIST_HEAD_INIT(audit_rules_list[7]),
>  };
>  
>  DEFINE_MUTEX(audit_filter_mutex);
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index d8aa2c690bf9..4f6ab34020fb 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -799,6 +799,35 @@ static int audit_in_mask(const struct audit_krule *rule, unsigned long val)
>  	return rule->mask[word] & bit;
>  }
>  
> +/**
> + * audit_filter_uring - apply filters to an io_uring operation
> + * @tsk: associated task
> + * @ctx: audit context
> + */
> +static void audit_filter_uring(struct task_struct *tsk,
> +			       struct audit_context *ctx)
> +{
> +	struct audit_entry *e;
> +	enum audit_state state;
> +
> +	if (auditd_test_task(tsk))
> +		return;

Is this necessary?  auditd and auditctl don't (intentionally) use any
io_uring functionality.  Is it possible it might inadvertantly use some
by virtue of libc or other library calls now or in the future?

> +	rcu_read_lock();
> +	list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_URING_EXIT],
> +				list) {
> +		if (audit_in_mask(&e->rule, ctx->uring_op) &&

While this seems like the most obvious approach given the parallels
between syscalls and io_uring operations, as coded here it won't work
due to the different mappings of syscall numbers and io_uring
operations unless we re-use the auditctl -S field with raw io_uring
operation numbers in the place of syscall numbers.  This should have
been obvious to me when I first looked at this patch.  It became obvious
when I started looking at the userspace auditctl.c.

The easy first step would be to use something like this:
	auditctl -a uring,always -S 18,28 -F key=uring_open
to monitor file open commands only.  The same is not yet possible for
the perm field, but there are so few io_uring ops at this point compared
with syscalls that it might be manageable.  The arch is irrelevant since
io_uring operation numbers are identical across all hardware as far as I
can tell.  Most of the rest of the fields should make sense if they do
for a syscall rule.  Here's a sample of userspace code to support this
patch:
	https://github.com/rgbriggs/audit-userspace/commit/a77baa1651b7ad841a220eb962d4cc92bc07dc96
	https://github.com/linux-audit/audit-userspace/compare/master...rgbriggs:ghau-iouring-filtering.v1.0

If we abuse the syscall infrastructure at first, we'd need a transition
plan to coordinate user and kernel switchover to seperate mechanisms for
the two to work together if the need should arise to have both syscall
and uring filters in the same rule.

I'm still exploring ideas on the best place to put this translation
table, in userspace libaudit, kaudit, io_op_defs[],
include/uapi/linux/io_uring.h, ...

For speed, the best would be in userspace libaudit, but that will be the
least obvious place for any io_uring developer to look when making any
updates to the list of io_uring operations and will most likely result
in additions escaping security logging.

Next best for speed would be in kaudit when instantiating rules, to
translate the syscall numbers to operation numbers and store them
natively in a different mask (audit_in_mask).  This also runs the risk
of additions escaping security logging.

After that, they could reuse the existing audit syscall filter
infrastructure and mask and translate on the fly from the specific op in
use to its equivalent syscall number when checking the existing filter.
This is the least disruptive to the audit code and the most likely to be
updated when new io_uring operations are added, but will incur an extra
step to translate the io_uring operation number to a syscall number in
the hot path.

Given that multiple aspects of security appear to have been a complete
afterthought to this feature, necessitating it to be bolted on after the
fact, it appears that the last option might be the most attractive to
the security folks, making this as easy as possible for fs folks to
update the audit code would be necessary to maintain security.

If there isn't a direct mapping between io_uring operations and
syscalls (the reverse isn't needed), then we'll need to duplicate the
mechanism throughout the stack starting in userspace auditctl and
libaudit.  Currently the bitmap for syscalls is 2k.  The current
io_uring op list appears to be 37.

It might be wise to deliberately not support auditctl "-w" (and the
exported audit_add_watch() function) since that is currently hardcoded
to the AUDIT_FILTER_EXIT list and is the old watch form [replaced by
audit_rule_fieldpair_data()] anyways that is more likely to be
deprecated.  It also appears to make sense not to support autrace (at
least initially).

Does any of this roughly make sense?

> +		    audit_filter_rules(tsk, &e->rule, ctx, NULL, &state,
> +				       false)) {
> +			rcu_read_unlock();
> +			ctx->current_state = state;
> +			return;
> +		}
> +	}
> +	rcu_read_unlock();
> +	return;
> +}
> +
>  /* At syscall exit time, this filter is called if the audit_state is
>   * not low enough that auditing cannot take place, but is also not
>   * high enough that we already know we have to write an audit record
> @@ -1754,7 +1783,7 @@ static void audit_log_exit(void)
>   * __audit_free - free a per-task audit context
>   * @tsk: task whose audit context block to free
>   *
> - * Called from copy_process and do_exit
> + * Called from copy_process, do_exit, and the io_uring code
>   */
>  void __audit_free(struct task_struct *tsk)
>  {
> @@ -1772,15 +1801,21 @@ void __audit_free(struct task_struct *tsk)
>  	 * random task_struct that doesn't doesn't have any meaningful data we
>  	 * need to log via audit_log_exit().
>  	 */
> -	if (tsk == current && !context->dummy &&
> -	    context->context == AUDIT_CTX_SYSCALL) {
> +	if (tsk == current && !context->dummy) {
>  		context->return_valid = AUDITSC_INVALID;
>  		context->return_code = 0;
> -
> -		audit_filter_syscall(tsk, context);
> -		audit_filter_inodes(tsk, context);
> -		if (context->current_state == AUDIT_RECORD_CONTEXT)
> -			audit_log_exit();
> +		if (context->context == AUDIT_CTX_SYSCALL) {
> +			audit_filter_syscall(tsk, context);
> +			audit_filter_inodes(tsk, context);
> +			if (context->current_state == AUDIT_RECORD_CONTEXT)
> +				audit_log_exit();
> +		} else if (context->context == AUDIT_CTX_URING) {
> +			/* TODO: verify this case is real and valid */
> +			audit_filter_uring(tsk, context);
> +			audit_filter_inodes(tsk, context);
> +			if (context->current_state == AUDIT_RECORD_CONTEXT)
> +				audit_log_uring(context);
> +		}
>  	}
>  
>  	audit_set_context(tsk, NULL);
> @@ -1861,12 +1896,6 @@ void __audit_uring_exit(int success, long code)
>  {
>  	struct audit_context *ctx = audit_context();
>  
> -	/*
> -	 * TODO: At some point we will likely want to filter on io_uring ops
> -	 *       and other things similar to what we do for syscalls, but that
> -	 *       is something for another day; just record what we can here.
> -	 */
> -
>  	if (!ctx || ctx->dummy)
>  		goto out;
>  	if (ctx->context == AUDIT_CTX_SYSCALL) {
> @@ -1891,6 +1920,8 @@ void __audit_uring_exit(int success, long code)
>  		 * the behavior here.
>  		 */
>  		audit_filter_syscall(current, ctx);
> +		if (ctx->current_state != AUDIT_RECORD_CONTEXT)
> +			audit_filter_uring(current, ctx);
>  		audit_filter_inodes(current, ctx);
>  		if (ctx->current_state != AUDIT_RECORD_CONTEXT)
>  			return;
> @@ -1899,7 +1930,9 @@ void __audit_uring_exit(int success, long code)
>  		return;
>  	}
>  #if 1
> -	/* XXX - temporary hack to force record generation */
> +	/* XXX - temporary hack to force record generation, we are leaving this
> +	 *       enabled, but if you want to actually test the filtering you
> +	 *       need to disable this #if/#endif block */
>  	ctx->current_state = AUDIT_RECORD_CONTEXT;
>  #endif
>  
> @@ -1907,6 +1940,8 @@ void __audit_uring_exit(int success, long code)
>  	if (!list_empty(&ctx->killed_trees))
>  		audit_kill_trees(ctx);
>  
> +	/* run through both filters to ensure we set the filterkey properly */
> +	audit_filter_uring(current, ctx);
>  	audit_filter_inodes(current, ctx);
>  	if (ctx->current_state != AUDIT_RECORD_CONTEXT)
>  		goto out;
> 
> --
> Linux-audit mailing list
> Linux-audit@xxxxxxxxxx
> https://listman.redhat.com/mailman/listinfo/linux-audit

- RGB

--
Richard Guy Briggs <rgb@xxxxxxxxxx>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635