On Fri, Oct 13, 2023 at 10:21 AM Jens Axboe <axboe@xxxxxxxxx> wrote: > On 10/13/23 2:24 AM, Christian Brauner wrote: > > On Thu, Oct 12, 2023 at 02:55:18PM -0700, Dan Clash wrote: > >> An io_uring openat operation can update an audit reference count > >> from multiple threads resulting in the call trace below. > >> > >> A call to io_uring_submit() with a single openat op with a flag of > >> IOSQE_ASYNC results in the following reference count updates. > >> > >> These first part of the system call performs two increments that do not race. > >> > >> do_syscall_64() > >> __do_sys_io_uring_enter() > >> io_submit_sqes() > >> io_openat_prep() > >> __io_openat_prep() > >> getname() > >> getname_flags() /* update 1 (increment) */ > >> __audit_getname() /* update 2 (increment) */ > >> > >> The openat op is queued to an io_uring worker thread which starts the > >> opportunity for a race. The system call exit performs one decrement. > >> > >> do_syscall_64() > >> syscall_exit_to_user_mode() > >> syscall_exit_to_user_mode_prepare() > >> __audit_syscall_exit() > >> audit_reset_context() > >> putname() /* update 3 (decrement) */ > >> > >> The io_uring worker thread performs one increment and two decrements. > >> These updates can race with the system call decrement. > >> > >> io_wqe_worker() > >> io_worker_handle_work() > >> io_wq_submit_work() > >> io_issue_sqe() > >> io_openat() > >> io_openat2() > >> do_filp_open() > >> path_openat() > >> __audit_inode() /* update 4 (increment) */ > >> putname() /* update 5 (decrement) */ > >> __audit_uring_exit() > >> audit_reset_context() > >> putname() /* update 6 (decrement) */ > >> > >> The fix is to change the refcnt member of struct audit_names > >> from int to atomic_t. > >> > >> kernel BUG at fs/namei.c:262! > >> Call Trace: > >> ... > >> ? putname+0x68/0x70 > >> audit_reset_context.part.0.constprop.0+0xe1/0x300 > >> __audit_uring_exit+0xda/0x1c0 > >> io_issue_sqe+0x1f3/0x450 > >> ? lock_timer_base+0x3b/0xd0 > >> io_wq_submit_work+0x8d/0x2b0 > >> ? __try_to_del_timer_sync+0x67/0xa0 > >> io_worker_handle_work+0x17c/0x2b0 > >> io_wqe_worker+0x10a/0x350 > >> > >> Cc: <stable@xxxxxxxxxxxxxxx> > >> Link: https://lore.kernel.org/lkml/MW2PR2101MB1033FFF044A258F84AEAA584F1C9A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > >> Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring") > >> Signed-off-by: Dan Clash <daclash@xxxxxxxxxxxxxxxxxxx> > >> --- > >> fs/namei.c | 9 +++++---- > >> include/linux/fs.h | 2 +- > >> kernel/auditsc.c | 8 ++++---- > >> 3 files changed, 10 insertions(+), 9 deletions(-) > >> > >> diff --git a/fs/namei.c b/fs/namei.c > >> index 567ee547492b..94565bd7e73f 100644 > >> --- a/fs/namei.c > >> +++ b/fs/namei.c > >> @@ -188,7 +188,7 @@ getname_flags(const char __user *filename, int flags, int *empty) > >> } > >> } > >> > >> - result->refcnt = 1; > >> + atomic_set(&result->refcnt, 1); > >> /* The empty path is special. */ > >> if (unlikely(!len)) { > >> if (empty) > >> @@ -249,7 +249,7 @@ getname_kernel(const char * filename) > >> memcpy((char *)result->name, filename, len); > >> result->uptr = NULL; > >> result->aname = NULL; > >> - result->refcnt = 1; > >> + atomic_set(&result->refcnt, 1); > >> audit_getname(result); > >> > >> return result; > >> @@ -261,9 +261,10 @@ void putname(struct filename *name) > >> if (IS_ERR(name)) > >> return; > >> > >> - BUG_ON(name->refcnt <= 0); > >> + if (WARN_ON_ONCE(!atomic_read(&name->refcnt))) > >> + return; > >> > >> - if (--name->refcnt > 0) > >> + if (!atomic_dec_and_test(&name->refcnt)) > >> return; > > > > Fine by me. I'd write this as: > > > > count = atomic_dec_if_positive(&name->refcnt); > > if (WARN_ON_ONCE(unlikely(count < 0)) > > return; > > if (count > 0) > > return; > > Would be fine too, my suspicion was that most archs don't implement a > primitive for that, and hence it might be more expensive than > atomic_read()/atomic_dec_and_test() which do. But I haven't looked at > the code generation. The dec_if_positive degenerates to a atomic cmpxchg > for most cases. I'm not too concerned, either approach works for me, the important bit is moving to an atomic_t/refcount_t so we can protect ourselves against the race. The patch looks good to me and I'd like to get this fix merged. Dan, barring any further back-and-forth on the putname() change, I would say to go ahead and make the change Christian suggested and repost the patch. Based on Jens comment above it seems safe to preserve his 'Reviewed-by:' tag on the next revision. Assuming there are no objections posted in the meantime, I'll plan to merge the next revision into the audit/stable-6.6 branch and get that up to Linus (likely next week since it's Friday). Thanks everyone! -- paul-moore.com