Re: [RFC PATCH 2/9] audit,io_uring,io-wq: add some basic audit support to io_uring

Pavel Begunkov <asml.silence@xxxxxxxxx> · Thu, 3 Jun 2021 11:51:44 +0100

On 6/2/21 8:46 PM, Paul Moore wrote:
> On Wed, Jun 2, 2021 at 4:27 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
>> On 5/28/21 5:02 PM, Paul Moore wrote:
>>> On Wed, May 26, 2021 at 4:19 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
>>>> ... If we moved the _entry
>>>> and _exit calls into the individual operation case blocks (quick
>>>> openat example below) so that only certain operations were able to be
>>>> audited would that be acceptable assuming the high frequency ops were
>>>> untouched?  My initial gut feeling was that this would involve >50% of
>>>> the ops, but Steve Grubb seems to think it would be less; it may be
>>>> time to look at that a bit more seriously, but if it gets a NACK
>>>> regardless it isn't worth the time - thoughts?
>>>>
>>>>   case IORING_OP_OPENAT:
>>>>     audit_uring_entry(req->opcode);
>>>>     ret = io_openat(req, issue_flags);
>>>>     audit_uring_exit(!ret, ret);
>>>>     break;
>>>
>>> I wanted to pose this question again in case it was lost in the
>>> thread, I suspect this may be the last option before we have to "fix"
>>> things at the Kconfig level.  I definitely don't want to have to go
>>> that route, and I suspect most everyone on this thread feels the same,
>>> so I'm hopeful we can find a solution that is begrudgingly acceptable
>>> to both groups.
>>
>> May work for me, but have to ask how many, and what is the
>> criteria? I'd think anything opening a file or manipulating fs:
>>
>> IORING_OP_ACCEPT, IORING_OP_CONNECT, IORING_OP_OPENAT[2],
>> IORING_OP_RENAMEAT, IORING_OP_UNLINKAT, IORING_OP_SHUTDOWN,
>> IORING_OP_FILES_UPDATE
>> + coming mkdirat and others.
>>
>> IORING_OP_CLOSE? IORING_OP_SEND IORING_OP_RECV?
>>
>> What about?
>> IORING_OP_FSYNC, IORING_OP_SYNC_FILE_RANGE,
>> IORING_OP_FALLOCATE, IORING_OP_STATX,
>> IORING_OP_FADVISE, IORING_OP_MADVISE,
>> IORING_OP_EPOLL_CTL
> 
> Looking quickly at v5.13-rc4 the following seems like candidates for
> auditing, there may be a small number of subtractions/additions to
> this list as people take a closer look, but it should serve as a
> starting point:
> 
> IORING_OP_SENDMSG
> IORING_OP_RECVMSG
> IORING_OP_ACCEPT
> IORING_OP_CONNECT
> IORING_OP_FALLOCATE
> IORING_OP_OPENAT
> IORING_OP_CLOSE
> IORING_OP_MADVISE
> IORING_OP_OPENAT2
> IORING_OP_SHUTDOWN
> IORING_OP_RENAMEAT
> IORING_OP_UNLINKAT
> 
> ... can you live with that list?

it will bloat binary somewhat, but considering it's all in one
place -- io_issue_sqe(), it's workable.

Not nice to have send/recv msg in the list, but I admit they
may do some crazy things. What can be traced for them? Because
at the moment of issue_sqe() not everything is read from the
userspace.

see: io_sendmsg() { ...; io_sendmsg_copy_hdr(); },

will copy header only in io_sendmsg() in most cases, and
then stash it for re-issuing if needed.

>> Another question, io_uring may exercise asynchronous paths,
>> i.e. io_issue_sqe() returns before requests completes.
>> Shouldn't be the case for open/etc at the moment, but was that
>> considered?
> 
> Yes, I noticed that when testing the code (and it makes sense when you
> look at how io_uring handles things).  Depending on the state of the
> system when the io_uring request is submitted I've seen both sync and
> async io_uring operations with the associated different calling
> contexts.  In the case where io_issue_sqe() needs to defer the
> operation to a different context you will see an audit record
> indicating that the operation failed and then another audit record
> when it completes; it's actually pretty interesting to be able to see
> how the system and io_uring are working.

Copying a reply to another message to keep clear out
of misunderstanding.

"io_issue_sqe() may return 0 but leave the request inflight,
which will be completed asynchronously e.g. by IRQ, not going
through io_issue_sqe() or any io_read()/etc helpers again, and
after last audit_end() had already happened.
That's the case with read/write/timeout, but is not true for
open/etc."

And there is interest in async send/recv[msg] as well (via
IRQ as described, callbacks, etc.).

> We could always mask out these delayed attempts, but at this early
> stage they are helpful, and they may be useful for admins.
> 
>> I don't see it happening, but would prefer to keep it open
>> async reimplementation in a distant future. Does audit sleep?
> 
> The only place in the audit_uring_entry()/audit_uring_exit() code path
> that could sleep at present is the call to audit_log_uring() which is
> made when the rules dictate that an audit record be generated.  The
> offending code is an allocation in audit_log_uring() which is
> currently GFP_KERNEL but really should be GFP_ATOMIC, or similar.  It
> was a copy-n-paste from the similar syscall function where GFP_KERNEL
> is appropriate due to the calling context at the end of the syscall.
> I'll change that as soon as I'm done with this email.

Ok, depends where it steers, but there may be a requirement to
not sleep for some hooks because of not having a sleepable context.

> 
> Of course if you are calling io_uring_enter(2), or something similar,
> then audit may sleep as part of the normal syscall processing (as
> mentioned above), but that is due to the fact that io_uring_enter(2)
> is a syscall and not because of anything in io_issue_sqe().
> 

-- 
Pavel Begunkov