On 6/2/21 8:46 PM, Paul Moore wrote: > On Wed, Jun 2, 2021 at 4:27 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: >> On 5/28/21 5:02 PM, Paul Moore wrote: >>> On Wed, May 26, 2021 at 4:19 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote: >>>> ... If we moved the _entry >>>> and _exit calls into the individual operation case blocks (quick >>>> openat example below) so that only certain operations were able to be >>>> audited would that be acceptable assuming the high frequency ops were >>>> untouched? My initial gut feeling was that this would involve >50% of >>>> the ops, but Steve Grubb seems to think it would be less; it may be >>>> time to look at that a bit more seriously, but if it gets a NACK >>>> regardless it isn't worth the time - thoughts? >>>> >>>> case IORING_OP_OPENAT: >>>> audit_uring_entry(req->opcode); >>>> ret = io_openat(req, issue_flags); >>>> audit_uring_exit(!ret, ret); >>>> break; >>> >>> I wanted to pose this question again in case it was lost in the >>> thread, I suspect this may be the last option before we have to "fix" >>> things at the Kconfig level. I definitely don't want to have to go >>> that route, and I suspect most everyone on this thread feels the same, >>> so I'm hopeful we can find a solution that is begrudgingly acceptable >>> to both groups. >> >> May work for me, but have to ask how many, and what is the >> criteria? I'd think anything opening a file or manipulating fs: >> >> IORING_OP_ACCEPT, IORING_OP_CONNECT, IORING_OP_OPENAT[2], >> IORING_OP_RENAMEAT, IORING_OP_UNLINKAT, IORING_OP_SHUTDOWN, >> IORING_OP_FILES_UPDATE >> + coming mkdirat and others. >> >> IORING_OP_CLOSE? IORING_OP_SEND IORING_OP_RECV? >> >> What about? >> IORING_OP_FSYNC, IORING_OP_SYNC_FILE_RANGE, >> IORING_OP_FALLOCATE, IORING_OP_STATX, >> IORING_OP_FADVISE, IORING_OP_MADVISE, >> IORING_OP_EPOLL_CTL > > Looking quickly at v5.13-rc4 the following seems like candidates for > auditing, there may be a small number of subtractions/additions to > this list as people take a closer look, but it should serve as a > starting point: > > IORING_OP_SENDMSG > IORING_OP_RECVMSG > IORING_OP_ACCEPT > IORING_OP_CONNECT > IORING_OP_FALLOCATE > IORING_OP_OPENAT > IORING_OP_CLOSE > IORING_OP_MADVISE > IORING_OP_OPENAT2 > IORING_OP_SHUTDOWN > IORING_OP_RENAMEAT > IORING_OP_UNLINKAT > > ... can you live with that list? it will bloat binary somewhat, but considering it's all in one place -- io_issue_sqe(), it's workable. Not nice to have send/recv msg in the list, but I admit they may do some crazy things. What can be traced for them? Because at the moment of issue_sqe() not everything is read from the userspace. see: io_sendmsg() { ...; io_sendmsg_copy_hdr(); }, will copy header only in io_sendmsg() in most cases, and then stash it for re-issuing if needed. >> Another question, io_uring may exercise asynchronous paths, >> i.e. io_issue_sqe() returns before requests completes. >> Shouldn't be the case for open/etc at the moment, but was that >> considered? > > Yes, I noticed that when testing the code (and it makes sense when you > look at how io_uring handles things). Depending on the state of the > system when the io_uring request is submitted I've seen both sync and > async io_uring operations with the associated different calling > contexts. In the case where io_issue_sqe() needs to defer the > operation to a different context you will see an audit record > indicating that the operation failed and then another audit record > when it completes; it's actually pretty interesting to be able to see > how the system and io_uring are working. Copying a reply to another message to keep clear out of misunderstanding. "io_issue_sqe() may return 0 but leave the request inflight, which will be completed asynchronously e.g. by IRQ, not going through io_issue_sqe() or any io_read()/etc helpers again, and after last audit_end() had already happened. That's the case with read/write/timeout, but is not true for open/etc." And there is interest in async send/recv[msg] as well (via IRQ as described, callbacks, etc.). > We could always mask out these delayed attempts, but at this early > stage they are helpful, and they may be useful for admins. > >> I don't see it happening, but would prefer to keep it open >> async reimplementation in a distant future. Does audit sleep? > > The only place in the audit_uring_entry()/audit_uring_exit() code path > that could sleep at present is the call to audit_log_uring() which is > made when the rules dictate that an audit record be generated. The > offending code is an allocation in audit_log_uring() which is > currently GFP_KERNEL but really should be GFP_ATOMIC, or similar. It > was a copy-n-paste from the similar syscall function where GFP_KERNEL > is appropriate due to the calling context at the end of the syscall. > I'll change that as soon as I'm done with this email. Ok, depends where it steers, but there may be a requirement to not sleep for some hooks because of not having a sleepable context. > > Of course if you are calling io_uring_enter(2), or something similar, > then audit may sleep as part of the normal syscall processing (as > mentioned above), but that is due to the fact that io_uring_enter(2) > is a syscall and not because of anything in io_issue_sqe(). > -- Pavel Begunkov