Re: [RFC PATCH 2/9] audit,io_uring,io-wq: add some basic audit support to io_uring

Paul Moore <paul@xxxxxxxxxxxxxx> · Wed, 2 Jun 2021 15:46:08 -0400

On Wed, Jun 2, 2021 at 4:27 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
> On 5/28/21 5:02 PM, Paul Moore wrote:
> > On Wed, May 26, 2021 at 4:19 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> >> ... If we moved the _entry
> >> and _exit calls into the individual operation case blocks (quick
> >> openat example below) so that only certain operations were able to be
> >> audited would that be acceptable assuming the high frequency ops were
> >> untouched?  My initial gut feeling was that this would involve >50% of
> >> the ops, but Steve Grubb seems to think it would be less; it may be
> >> time to look at that a bit more seriously, but if it gets a NACK
> >> regardless it isn't worth the time - thoughts?
> >>
> >>   case IORING_OP_OPENAT:
> >>     audit_uring_entry(req->opcode);
> >>     ret = io_openat(req, issue_flags);
> >>     audit_uring_exit(!ret, ret);
> >>     break;
> >
> > I wanted to pose this question again in case it was lost in the
> > thread, I suspect this may be the last option before we have to "fix"
> > things at the Kconfig level.  I definitely don't want to have to go
> > that route, and I suspect most everyone on this thread feels the same,
> > so I'm hopeful we can find a solution that is begrudgingly acceptable
> > to both groups.
>
> May work for me, but have to ask how many, and what is the
> criteria? I'd think anything opening a file or manipulating fs:
>
> IORING_OP_ACCEPT, IORING_OP_CONNECT, IORING_OP_OPENAT[2],
> IORING_OP_RENAMEAT, IORING_OP_UNLINKAT, IORING_OP_SHUTDOWN,
> IORING_OP_FILES_UPDATE
> + coming mkdirat and others.
>
> IORING_OP_CLOSE? IORING_OP_SEND IORING_OP_RECV?
>
> What about?
> IORING_OP_FSYNC, IORING_OP_SYNC_FILE_RANGE,
> IORING_OP_FALLOCATE, IORING_OP_STATX,
> IORING_OP_FADVISE, IORING_OP_MADVISE,
> IORING_OP_EPOLL_CTL

Looking quickly at v5.13-rc4 the following seems like candidates for
auditing, there may be a small number of subtractions/additions to
this list as people take a closer look, but it should serve as a
starting point:

IORING_OP_SENDMSG
IORING_OP_RECVMSG
IORING_OP_ACCEPT
IORING_OP_CONNECT
IORING_OP_FALLOCATE
IORING_OP_OPENAT
IORING_OP_CLOSE
IORING_OP_MADVISE
IORING_OP_OPENAT2
IORING_OP_SHUTDOWN
IORING_OP_RENAMEAT
IORING_OP_UNLINKAT

... can you live with that list?

> Another question, io_uring may exercise asynchronous paths,
> i.e. io_issue_sqe() returns before requests completes.
> Shouldn't be the case for open/etc at the moment, but was that
> considered?

Yes, I noticed that when testing the code (and it makes sense when you
look at how io_uring handles things).  Depending on the state of the
system when the io_uring request is submitted I've seen both sync and
async io_uring operations with the associated different calling
contexts.  In the case where io_issue_sqe() needs to defer the
operation to a different context you will see an audit record
indicating that the operation failed and then another audit record
when it completes; it's actually pretty interesting to be able to see
how the system and io_uring are working.

We could always mask out these delayed attempts, but at this early
stage they are helpful, and they may be useful for admins.

> I don't see it happening, but would prefer to keep it open
> async reimplementation in a distant future. Does audit sleep?

The only place in the audit_uring_entry()/audit_uring_exit() code path
that could sleep at present is the call to audit_log_uring() which is
made when the rules dictate that an audit record be generated.  The
offending code is an allocation in audit_log_uring() which is
currently GFP_KERNEL but really should be GFP_ATOMIC, or similar.  It
was a copy-n-paste from the similar syscall function where GFP_KERNEL
is appropriate due to the calling context at the end of the syscall.
I'll change that as soon as I'm done with this email.

Of course if you are calling io_uring_enter(2), or something similar,
then audit may sleep as part of the normal syscall processing (as
mentioned above), but that is due to the fact that io_uring_enter(2)
is a syscall and not because of anything in io_issue_sqe().

-- 
paul moore
www.paul-moore.com