Re: [PATCH] audit,io_uring: io_uring openat triggers audit reference count underflow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2023-10-13 11:43, Paul Moore wrote:
On Fri, Oct 13, 2023 at 10:21 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
On 10/13/23 2:24 AM, Christian Brauner wrote:
On Thu, Oct 12, 2023 at 02:55:18PM -0700, Dan Clash wrote:
An io_uring openat operation can update an audit reference count
from multiple threads resulting in the call trace below.

A call to io_uring_submit() with a single openat op with a flag of
IOSQE_ASYNC results in the following reference count updates.

These first part of the system call performs two increments that do not race.

do_syscall_64()
   __do_sys_io_uring_enter()
     io_submit_sqes()
       io_openat_prep()
         __io_openat_prep()
           getname()
             getname_flags()       /* update 1 (increment) */
               __audit_getname()   /* update 2 (increment) */

The openat op is queued to an io_uring worker thread which starts the
opportunity for a race.  The system call exit performs one decrement.

do_syscall_64()
   syscall_exit_to_user_mode()
     syscall_exit_to_user_mode_prepare()
       __audit_syscall_exit()
         audit_reset_context()
            putname()              /* update 3 (decrement) */

The io_uring worker thread performs one increment and two decrements.
These updates can race with the system call decrement.

io_wqe_worker()
   io_worker_handle_work()
     io_wq_submit_work()
       io_issue_sqe()
         io_openat()
           io_openat2()
             do_filp_open()
               path_openat()
                 __audit_inode()   /* update 4 (increment) */
             putname()             /* update 5 (decrement) */
         __audit_uring_exit()
           audit_reset_context()
             putname()             /* update 6 (decrement) */

The fix is to change the refcnt member of struct audit_names
from int to atomic_t.

kernel BUG at fs/namei.c:262!
Call Trace:
...
  ? putname+0x68/0x70
  audit_reset_context.part.0.constprop.0+0xe1/0x300
  __audit_uring_exit+0xda/0x1c0
  io_issue_sqe+0x1f3/0x450
  ? lock_timer_base+0x3b/0xd0
  io_wq_submit_work+0x8d/0x2b0
  ? __try_to_del_timer_sync+0x67/0xa0
  io_worker_handle_work+0x17c/0x2b0
  io_wqe_worker+0x10a/0x350

Cc: <stable@xxxxxxxxxxxxxxx>
Link: https://lore.kernel.org/lkml/MW2PR2101MB1033FFF044A258F84AEAA584F1C9A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
Signed-off-by: Dan Clash <daclash@xxxxxxxxxxxxxxxxxxx>
---
  fs/namei.c         | 9 +++++----
  include/linux/fs.h | 2 +-
  kernel/auditsc.c   | 8 ++++----
  3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 567ee547492b..94565bd7e73f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -188,7 +188,7 @@ getname_flags(const char __user *filename, int flags, int *empty)
              }
      }

-    result->refcnt = 1;
+    atomic_set(&result->refcnt, 1);
      /* The empty path is special. */
      if (unlikely(!len)) {
              if (empty)
@@ -249,7 +249,7 @@ getname_kernel(const char * filename)
      memcpy((char *)result->name, filename, len);
      result->uptr = NULL;
      result->aname = NULL;
-    result->refcnt = 1;
+    atomic_set(&result->refcnt, 1);
      audit_getname(result);

      return result;
@@ -261,9 +261,10 @@ void putname(struct filename *name)
      if (IS_ERR(name))
              return;

-    BUG_ON(name->refcnt <= 0);
+    if (WARN_ON_ONCE(!atomic_read(&name->refcnt)))
+            return;

-    if (--name->refcnt > 0)
+    if (!atomic_dec_and_test(&name->refcnt))
              return;

Fine by me. I'd write this as:

count = atomic_dec_if_positive(&name->refcnt);
if (WARN_ON_ONCE(unlikely(count < 0))
       return;
if (count > 0)
       return;

Would be fine too, my suspicion was that most archs don't implement a
primitive for that, and hence it might be more expensive than
atomic_read()/atomic_dec_and_test() which do. But I haven't looked at
the code generation. The dec_if_positive degenerates to a atomic cmpxchg
for most cases.

I'm not too concerned, either approach works for me, the important bit
is moving to an atomic_t/refcount_t so we can protect ourselves
against the race.  The patch looks good to me and I'd like to get this
fix merged.

Dan, barring any further back-and-forth on the putname() change, I
would say to go ahead and make the change Christian suggested and
repost the patch.  Based on Jens comment above it seems safe to
preserve his 'Reviewed-by:' tag on the next revision.  Assuming there
are no objections posted in the meantime, I'll plan to merge the next
revision into the audit/stable-6.6 branch and get that up to Linus
(likely next week since it's Friday).

I did not see many arch implementations of atomic_dec_if_positive.
The x86_64 generated code looks like arch_atomic_dec_unless_positive()
in atomic-arch-fallback.h with a loop around lock cmpxchg.

I did not want to compound the email race so I did not send patch v2 but I can if desired.


devvm2 ~/linux $ sysctl kernel.arch
kernel.arch = x86_64

devvm2 ~/linux $ cat -n ./fs/namei.c | grep -B 7 -A 4 atomic_dec_if_positive
   259  void putname(struct filename *name)
   260  {
   261          int count;
   262
   263          if (IS_ERR(name))
   264                  return;
   265
   266          count = atomic_dec_if_positive(&name->refcnt);
   267          if (WARN_ON_ONCE(unlikely(count < 0)))
   268                  return;
   269          if (count > 0)
   270                  return;

devvm2 ~/linux $ objdump --disassemble --line-numbers ./fs/namei.o | \
grep -B 8 -A 12 atomic_dec_if_positive
/home/daclash/linux/fs/namei.c:260
     22e:       55                      push   %rbp
     22f:       48 89 e5                mov    %rsp,%rbp
     232:       41 54                   push   %r12
arch_atomic_read():
/home/daclash/linux/./arch/x86/include/asm/atomic.h:23
     234:       8b 47 10                mov    0x10(%rdi),%eax
     237:       49 89 fc                mov    %rdi,%r12
raw_atomic_dec_if_positive():
/home/daclash/linux/./include/linux/atomic/atomic-arch-fallback.h:2535
     23a:       89 c2                   mov    %eax,%edx
     23c:       83 ea 01                sub    $0x1,%edx
     23f:       78 50                   js     291 <putname+0x71>
arch_atomic_try_cmpxchg():
/home/daclash/linux/./arch/x86/include/asm/atomic.h:115
     241:       f0 41 0f b1 54 24 10    lock cmpxchg %edx,0x10(%r12)
     248:       75 f0                   jne    23a <putname+0x1a>
putname():
/home/daclash/linux/fs/namei.c:269
     24a:       85 d2                   test   %edx,%edx
     24c:       75 22                   jne    270 <putname+0x50>




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux