On 6/3/2021 3:51 AM, Pavel Begunkov wrote: > On 6/2/21 8:46 PM, Paul Moore wrote: >> On Wed, Jun 2, 2021 at 4:27 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: >>> On 5/28/21 5:02 PM, Paul Moore wrote: >>>> On Wed, May 26, 2021 at 4:19 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote: >>>>> ... If we moved the _entry >>>>> and _exit calls into the individual operation case blocks (quick >>>>> openat example below) so that only certain operations were able to be >>>>> audited would that be acceptable assuming the high frequency ops were >>>>> untouched? My initial gut feeling was that this would involve >50% of >>>>> the ops, but Steve Grubb seems to think it would be less; it may be >>>>> time to look at that a bit more seriously, but if it gets a NACK >>>>> regardless it isn't worth the time - thoughts? >>>>> >>>>> case IORING_OP_OPENAT: >>>>> audit_uring_entry(req->opcode); >>>>> ret = io_openat(req, issue_flags); >>>>> audit_uring_exit(!ret, ret); >>>>> break; >>>> I wanted to pose this question again in case it was lost in the >>>> thread, I suspect this may be the last option before we have to "fix" >>>> things at the Kconfig level. I definitely don't want to have to go >>>> that route, and I suspect most everyone on this thread feels the same, >>>> so I'm hopeful we can find a solution that is begrudgingly acceptable >>>> to both groups. >>> May work for me, but have to ask how many, and what is the >>> criteria? I'd think anything opening a file or manipulating fs: >>> >>> IORING_OP_ACCEPT, IORING_OP_CONNECT, IORING_OP_OPENAT[2], >>> IORING_OP_RENAMEAT, IORING_OP_UNLINKAT, IORING_OP_SHUTDOWN, >>> IORING_OP_FILES_UPDATE >>> + coming mkdirat and others. >>> >>> IORING_OP_CLOSE? IORING_OP_SEND IORING_OP_RECV? >>> >>> What about? >>> IORING_OP_FSYNC, IORING_OP_SYNC_FILE_RANGE, >>> IORING_OP_FALLOCATE, IORING_OP_STATX, >>> IORING_OP_FADVISE, IORING_OP_MADVISE, >>> IORING_OP_EPOLL_CTL >> Looking quickly at v5.13-rc4 the following seems like candidates for >> auditing, there may be a small number of subtractions/additions to >> this list as people take a closer look, but it should serve as a >> starting point: >> >> IORING_OP_SENDMSG >> IORING_OP_RECVMSG >> IORING_OP_ACCEPT >> IORING_OP_CONNECT >> IORING_OP_FALLOCATE >> IORING_OP_OPENAT >> IORING_OP_CLOSE >> IORING_OP_MADVISE >> IORING_OP_OPENAT2 >> IORING_OP_SHUTDOWN >> IORING_OP_RENAMEAT >> IORING_OP_UNLINKAT >> >> ... can you live with that list? > it will bloat binary somewhat, but considering it's all in one > place -- io_issue_sqe(), it's workable. > > Not nice to have send/recv msg in the list, but I admit they > may do some crazy things. What can be traced for them? Both SELinux and Smack do access checks on packet operations. As access may be denied by these checks, audit needs to be available. This is true for UDS, IP and at least one other protocol family. > Because > at the moment of issue_sqe() not everything is read from the > userspace. > > see: io_sendmsg() { ...; io_sendmsg_copy_hdr(); }, > > will copy header only in io_sendmsg() in most cases, and > then stash it for re-issuing if needed. > > >>> Another question, io_uring may exercise asynchronous paths, >>> i.e. io_issue_sqe() returns before requests completes. >>> Shouldn't be the case for open/etc at the moment, but was that >>> considered? >> Yes, I noticed that when testing the code (and it makes sense when you >> look at how io_uring handles things). Depending on the state of the >> system when the io_uring request is submitted I've seen both sync and >> async io_uring operations with the associated different calling >> contexts. In the case where io_issue_sqe() needs to defer the >> operation to a different context you will see an audit record >> indicating that the operation failed and then another audit record >> when it completes; it's actually pretty interesting to be able to see >> how the system and io_uring are working. > Copying a reply to another message to keep clear out > of misunderstanding. > > "io_issue_sqe() may return 0 but leave the request inflight, > which will be completed asynchronously e.g. by IRQ, not going > through io_issue_sqe() or any io_read()/etc helpers again, and > after last audit_end() had already happened. > That's the case with read/write/timeout, but is not true for > open/etc." > > And there is interest in async send/recv[msg] as well (via > IRQ as described, callbacks, etc.). > >> We could always mask out these delayed attempts, but at this early >> stage they are helpful, and they may be useful for admins. >> >>> I don't see it happening, but would prefer to keep it open >>> async reimplementation in a distant future. Does audit sleep? >> The only place in the audit_uring_entry()/audit_uring_exit() code path >> that could sleep at present is the call to audit_log_uring() which is >> made when the rules dictate that an audit record be generated. The >> offending code is an allocation in audit_log_uring() which is >> currently GFP_KERNEL but really should be GFP_ATOMIC, or similar. It >> was a copy-n-paste from the similar syscall function where GFP_KERNEL >> is appropriate due to the calling context at the end of the syscall. >> I'll change that as soon as I'm done with this email. > Ok, depends where it steers, but there may be a requirement to > not sleep for some hooks because of not having a sleepable context. > >> Of course if you are calling io_uring_enter(2), or something similar, >> then audit may sleep as part of the normal syscall processing (as >> mentioned above), but that is due to the fact that io_uring_enter(2) >> is a syscall and not because of anything in io_issue_sqe(). >>