Re: [PATCH] audit,io_uring: io_uring openat triggers audit reference count underflow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/13/23 2:24 AM, Christian Brauner wrote:
> On Thu, Oct 12, 2023 at 02:55:18PM -0700, Dan Clash wrote:
>> An io_uring openat operation can update an audit reference count
>> from multiple threads resulting in the call trace below.
>>
>> A call to io_uring_submit() with a single openat op with a flag of
>> IOSQE_ASYNC results in the following reference count updates.
>>
>> These first part of the system call performs two increments that do not race.
>>
>> do_syscall_64()
>>   __do_sys_io_uring_enter()
>>     io_submit_sqes()
>>       io_openat_prep()
>>         __io_openat_prep()
>>           getname()
>>             getname_flags()       /* update 1 (increment) */
>>               __audit_getname()   /* update 2 (increment) */
>>
>> The openat op is queued to an io_uring worker thread which starts the
>> opportunity for a race.  The system call exit performs one decrement.
>>
>> do_syscall_64()
>>   syscall_exit_to_user_mode()
>>     syscall_exit_to_user_mode_prepare()
>>       __audit_syscall_exit()
>>         audit_reset_context()
>>            putname()              /* update 3 (decrement) */
>>
>> The io_uring worker thread performs one increment and two decrements.
>> These updates can race with the system call decrement.
>>
>> io_wqe_worker()
>>   io_worker_handle_work()
>>     io_wq_submit_work()
>>       io_issue_sqe()
>>         io_openat()
>>           io_openat2()
>>             do_filp_open()
>>               path_openat()
>>                 __audit_inode()   /* update 4 (increment) */
>>             putname()             /* update 5 (decrement) */
>>         __audit_uring_exit()
>>           audit_reset_context()
>>             putname()             /* update 6 (decrement) */
>>
>> The fix is to change the refcnt member of struct audit_names
>> from int to atomic_t.
>>
>> kernel BUG at fs/namei.c:262!
>> Call Trace:
>> ...
>>  ? putname+0x68/0x70
>>  audit_reset_context.part.0.constprop.0+0xe1/0x300
>>  __audit_uring_exit+0xda/0x1c0
>>  io_issue_sqe+0x1f3/0x450
>>  ? lock_timer_base+0x3b/0xd0
>>  io_wq_submit_work+0x8d/0x2b0
>>  ? __try_to_del_timer_sync+0x67/0xa0
>>  io_worker_handle_work+0x17c/0x2b0
>>  io_wqe_worker+0x10a/0x350
>>
>> Cc: <stable@xxxxxxxxxxxxxxx>
>> Link: https://lore.kernel.org/lkml/MW2PR2101MB1033FFF044A258F84AEAA584F1C9A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>> Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
>> Signed-off-by: Dan Clash <daclash@xxxxxxxxxxxxxxxxxxx>
>> ---
>>  fs/namei.c         | 9 +++++----
>>  include/linux/fs.h | 2 +-
>>  kernel/auditsc.c   | 8 ++++----
>>  3 files changed, 10 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 567ee547492b..94565bd7e73f 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -188,7 +188,7 @@ getname_flags(const char __user *filename, int flags, int *empty)
>>  		}
>>  	}
>>  
>> -	result->refcnt = 1;
>> +	atomic_set(&result->refcnt, 1);
>>  	/* The empty path is special. */
>>  	if (unlikely(!len)) {
>>  		if (empty)
>> @@ -249,7 +249,7 @@ getname_kernel(const char * filename)
>>  	memcpy((char *)result->name, filename, len);
>>  	result->uptr = NULL;
>>  	result->aname = NULL;
>> -	result->refcnt = 1;
>> +	atomic_set(&result->refcnt, 1);
>>  	audit_getname(result);
>>  
>>  	return result;
>> @@ -261,9 +261,10 @@ void putname(struct filename *name)
>>  	if (IS_ERR(name))
>>  		return;
>>  
>> -	BUG_ON(name->refcnt <= 0);
>> +	if (WARN_ON_ONCE(!atomic_read(&name->refcnt)))
>> +		return;
>>  
>> -	if (--name->refcnt > 0)
>> +	if (!atomic_dec_and_test(&name->refcnt))
>>  		return;
> 
> Fine by me. I'd write this as:
> 
> count = atomic_dec_if_positive(&name->refcnt);
> if (WARN_ON_ONCE(unlikely(count < 0))
> 	return;
> if (count > 0)
> 	return;

Would be fine too, my suspicion was that most archs don't implement a
primitive for that, and hence it might be more expensive than
atomic_read()/atomic_dec_and_test() which do. But I haven't looked at
the code generation. The dec_if_positive degenerates to a atomic cmpxchg
for most cases.

-- 
Jens Axboe





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux