Re: [PATCH] NFSV4: fix rpc_task use-after-free when open concurrently

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2024/10/11 15:08, yangerkun 写道:


在 2024/10/10 22:14, Trond Myklebust 写道:
On Thu, 2024-10-10 at 21:40 +0800, yangerkun wrote:


...




The logic of this place is a little complicated for me. I'd like to
simplify it:

Actually, for the normal case, we no need call nfs_release_seqid
here.
The second task will wait until the first task completes its work,
call
rpc_wake_up_queued_task in
nfs_release_seqid(_nfs4_do_open->nfs4_opendata_put-
nfs4_opendata_free->_,
and the first rpc_task already be freed in nfs4_run_open_task) to
wake
up the second task, allowing it to complete the remaining open
operation. So, no use-after-free will happen.

Based on this, we should only call nfs_release_seqid for the abnormal
case:

      - nfs4_opendata->cancelled == true: will return directly in
_nfs4_proc_open

This test is not needed. The question of whether or not the caller is
still waiting for the RPC call to complete is irrelevant to the problem
at hand, and just adds unnecessary testing overhead for the 99.9% case
where the OPEN is successful and the caller is waiting to complete the
call.

Yes, agree.



      - **or** nfs4_opendata->rpc_status != 0: the status will be the
return value for nfs4_run_open_task, then _nfs4_proc_open will return
directly

This test is relevant. If the RPC call is interrupted, then it might
have happened while we were waiting to be given the sequence lock.

Yes, agree.



      - **or** nfs4_opendata->rpc_done != true: _nfs4_proc_open will
return directly

This test is relevant, but only in conjunction with the test for -
rpc_status != 0.

If ->rpc_status == 0, then we know that we already hold the sequence
lock and so there is no danger of the use-after-free occurring.

Please see the latter detail.

Furthermore, we want to keep holding that lock so that we can safely
call nfs4_try_open_cached() without having to worry about conflicting
CLOSE calls (this is only true for NFSv4.0 - NFSv4.1 can resolve
conflicts using nfs41_test_and_free_expired_stateid()).

Yes, agree.


Thanks a lot for the detail explain, but with latter kernel diff and testcase(sleep in kernel will help increase the recurrence probability), use-after-free can still happen.


# kernel diff:

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index c2045a2a9d0f..1502f39ee18c 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -250,6 +250,7 @@ struct nfs4_opendata {
         bool is_recover;
         bool cancelled;
         int rpc_status;
+       struct rpc_task *task;
  };

  struct nfs4_add_xprt_data {
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 4685621ba469..04fdc019965e 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2506,6 +2506,9 @@ static void nfs4_open_prepare(struct rpc_task *task, void *calldata)
         struct nfs_client *clp = sp->so_server->nfs_client;
         enum open_claim_type4 claim = data->o_arg.claim;

+       if (!(data->task))
+               data->task = task;
+       printk("%s data %llx task %llx\n", __func__, data, task);
         if (nfs_wait_on_sequence(data->o_arg.seqid, task) != 0)
                 goto out_wait;
         /*
@@ -2603,7 +2606,10 @@ static void nfs4_open_release(void *calldata)
         struct nfs4_opendata *data = calldata;
         struct nfs4_state *state = NULL;

+       printk("%s data %llx task %llx cancelled %d rpc_status %d rpc_done %d tk_status %d\n", +               __func__, data, data->task, data->cancelled, data->rpc_status, data->rpc_done, data->task->tk_status);
+       if (data->rpc_status != 0 && !data->rpc_done)
+               nfs_release_seqid(data->o_arg.seqid);
         /* If this request hasn't been cancelled, do nothing */
         if (!data->cancelled)
                 goto out_free;
@@ -2669,12 +2675,14 @@ static int nfs4_run_open_task(struct nfs4_opendata *data,
         task = rpc_run_task(&task_setup_data);
         if (IS_ERR(task))
                 return PTR_ERR(task);
+       msleep(100);
         status = rpc_wait_for_completion_task(task);
         if (status != 0) {
                 data->cancelled = true;
                 smp_wmb();
         } else
                 status = data->rpc_status;
+       msleep(100);
         rpc_put_task(task);

         return status;
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 877f682b45f2..81d31b7f723f 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1100,6 +1100,7 @@ void nfs_release_seqid(struct nfs_seqid *seqid)

                 next = list_first_entry(&sequence->list,
                                 struct nfs_seqid, list);
+               mdelay(100);
                 rpc_wake_up_queued_task(&sequence->wait, next->task);
         }
         spin_unlock(&sequence->lock);




# testcase:

#!/bin/bash

mkdir /mnt/sda
mkdir /mnt/sdb
mkdir /mnt1
touch /mnt/sda/file
touch /mnt/sdb/file
echo "/ *(rw,no_root_squash,fsid=0)" > /etc/exports
echo "/mnt *(rw,no_root_squash,fsid=1)" >> /etc/exports
exportfs -ra
service nfs-server start
while true; do umount -f /mnt1; done &
while true; do
         echo "will mount"
         mount -t nfs -o vers=4.0 127.0.0.1:/mnt /mnt1
         exec 4<> /mnt1/sda/file &
         exec 5<> /mnt1/sdb/file &
         wait
         umount /mnt1
         echo "umount ok"
done


[   94.159337] nfs4_open_prepare data ff11000006eec800 task ff110000472e9040 [   94.172653] nfs4_open_prepare data ff11000006eed800 task ff110000472e88c0 [   94.369424] nfs4_open_release data ff11000006eec800 task ff110000472e9040 cancelled 0 rpc_status -10013 rpc_done 1 tk_status -10013 [   94.417483] nfs4_open_release data ff11000006eed800 task ff110000472e88c0 cancelled 0 rpc_status -512 rpc_done 1 tk_status -512 [   94.476518] ================================================================== [   94.477340] BUG: KASAN: slab-use-after-free in rpc_wake_up_queued_task+0x2c/0xc0
[   94.478850] Read of size 8 at addr ff110000472e88f0 by task sh/856
[   94.479553]
[   94.479818] CPU: 2 UID: 0 PID: 856 Comm: sh Not tainted 6.11.0-09961-g2db5fff64932-dirty #23 [   94.481326] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
[   94.483107] Call Trace:
[   94.483683]  <TASK>
[   94.483924]  dump_stack_lvl+0xa3/0x120
[   94.484874]  print_address_description.constprop.0+0x63/0x510
[   94.486651]  print_report+0xf5/0x360
[   94.490550]  kasan_report+0xd9/0x140
[   94.491764]  kasan_check_range+0x2d1/0x300
[   94.492365]  __kasan_check_read+0x21/0x30
[   94.492817]  rpc_wake_up_queued_task+0x2c/0xc0
[   94.493377]  nfs_release_seqid+0x1f8/0x300
[   94.493935]  nfs_free_seqid+0x1a/0x40
[   94.494421]  nfs4_opendata_free+0xc6/0x3e0
[   94.495078]  _nfs4_do_open+0xc0a/0x13b0
[   94.504619]  nfs4_do_open+0x26d/0x5e0
[   94.507678]  nfs4_atomic_open+0x2c6/0x3a0
[   94.509928]  nfs_atomic_open+0x4f8/0x1180
[   94.514713]  atomic_open+0x186/0x4e0
[   94.515142]  lookup_open.isra.0+0x3e7/0x15b0
[   94.516519]  open_last_lookups+0x85d/0x1260
[   94.517032]  path_openat+0x151/0x7b0
[   94.518016]  do_filp_open+0x1e0/0x310
[   94.521059]  do_sys_openat2+0x178/0x1f0
[   94.524375]  do_sys_open+0xa2/0x100
[   94.526394]  __x64_sys_openat+0xa8/0x120
[   94.526857]  x64_sys_call+0x2507/0x4540
[   94.527289]  do_syscall_64+0xa7/0x240
[   94.527760]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
...
[   94.536712]
[   94.536896] Allocated by task 857:
[   94.537321]  kasan_save_stack+0x3b/0x70
[   94.537810]  kasan_save_track+0x1c/0x40
[   94.538330]  kasan_save_alloc_info+0x44/0x70
[   94.538829]  __kasan_slab_alloc+0xaf/0xc0
[   94.539298]  kmem_cache_alloc_noprof+0x1e0/0x4f0
[   94.539903]  rpc_new_task+0xe7/0x220
[   94.540321]  rpc_run_task+0x27/0x7d0
[   94.540836]  nfs4_run_open_task+0x477/0x810
[   94.541380]  _nfs4_proc_open+0xc0/0x6d0
[   94.541905]  _nfs4_open_and_get_state+0x178/0xc50
[   94.542476]  _nfs4_do_open+0x4a6/0x13b0
[   94.542967]  nfs4_do_open+0x26d/0x5e0
[   94.543473]  nfs4_atomic_open+0x2c6/0x3a0
[   94.543947]  nfs_atomic_open+0x4f8/0x1180
[   94.544398]  atomic_open+0x186/0x4e0
[   94.544862]  lookup_open.isra.0+0x3e7/0x15b0
[   94.545406]  open_last_lookups+0x85d/0x1260
[   94.545878]  path_openat+0x151/0x7b0
[   94.546387]  do_filp_open+0x1e0/0x310
[   94.546868]  do_sys_openat2+0x178/0x1f0
[   94.547341]  do_sys_open+0xa2/0x100
[   94.547799]  __x64_sys_openat+0xa8/0x120
[   94.548335]  x64_sys_call+0x2507/0x4540
[   94.548819]  do_syscall_64+0xa7/0x240
[   94.549317]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   94.549969]
[   94.550177] Freed by task 857:
[   94.550603]  kasan_save_stack+0x3b/0x70
[   94.551101]  kasan_save_track+0x1c/0x40
[   94.551558]  kasan_save_free_info+0x43/0x80
[   94.552140]  __kasan_slab_free+0x4f/0x90
[   94.552587]  kmem_cache_free+0x199/0x4f0
[   94.553111]  mempool_free_slab+0x1f/0x30
[   94.553600]  mempool_free+0xdf/0x3d0
[   94.554072]  rpc_free_task+0x12d/0x180
[   94.554525]  rpc_final_put_task+0x10e/0x150
[   94.555022]  rpc_do_put_task+0x63/0x80
[   94.555532]  rpc_put_task+0x18/0x30
[   94.556004]  nfs4_run_open_task+0x506/0x810
[   94.556491]  _nfs4_proc_open+0xc0/0x6d0
[   94.557008]  _nfs4_open_and_get_state+0x178/0xc50
[   94.557551]  _nfs4_do_open+0x4a6/0x13b0
[   94.557982]  nfs4_do_open+0x26d/0x5e0
[   94.558464]  nfs4_atomic_open+0x2c6/0x3a0
[   94.558968]  nfs_atomic_open+0x4f8/0x1180
[   94.559428]  atomic_open+0x186/0x4e0
[   94.559818]  lookup_open.isra.0+0x3e7/0x15b0
[   94.560327]  open_last_lookups+0x85d/0x1260
[   94.560831]  path_openat+0x151/0x7b0
[   94.561330]  do_filp_open+0x1e0/0x310
[   94.561770]  do_sys_openat2+0x178/0x1f0
[   94.562184]  do_sys_open+0xa2/0x100
[   94.562563]  __x64_sys_openat+0xa8/0x120
[   94.563057]  x64_sys_call+0x2507/0x4540
[   94.563532]  do_syscall_64+0xa7/0x240
[   94.563956]  entry_SYSCALL_64_after_hwframe+0x76/0x7e







Things go like this:

Thread A                            Thread B

// _nfs4_do_open
opendata1 and seqid1

// nfs4_run_open_task
rpc_task1
                                    // _nfs4_do_open
                                    opendata2 and seqid2

                                    // nfs4_run_open_task
                                    rpc_task2

// nfs4_run_open_task
rpc_task1 free


-------------------------------------------------------
                     umount -f generate signal
-------------------------------------------------------
                                   // wakeup by signal
                                   rpc_task2 free, seqid2 still
                   alive since rpc_status == -512,
                                   rpc_done == true



// _nfs4_do_open
put opendata1 and
then nfs_free_seqid
release seqid1, wakeup
rpc_task2, but task2 has
been freed, UAF!!!!!!

                                   // _nfs4_do_open
                                   put opendata2 and release seqid2,
                                   but it's too later...


Actually, once seqid2 can be released correctly, use-after-free may also trigger...

Thread A                            Thread B

// _nfs4_do_open
opendata1 and seqid1

// nfs4_run_open_task
rpc_task1
                                    // _nfs4_do_open
                                    opendata2 and seqid2

                                    // nfs4_run_open_task
                                    rpc_task2

// nfs4_run_open_task
rpc_task1 free


-------------------------------------------------------
                     umount -f generate signal
-------------------------------------------------------
                                   // wakeup by signal
                                   rpc_task2 free, and nfs_free_seqid
                                   will be trigger, try to wakeup
                                   rpc_task1, trigger UAF!!!!



// _nfs4_do_open
put opendata1 and
then nfs_free_seqid
release seqid1, but
too later...
                                   // _nfs4_do_open
                                   put opendata2 and release seqid2



How about ping task->tk_count, and not release it until nfs_release_seqid...

After testing, there are some problems with the code here, please ignore it.


diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 877f682b45f2..15eb6ef611fb 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1095,11 +1095,14 @@ void nfs_release_seqid(struct nfs_seqid *seqid)
         sequence = seqid->sequence;
         spin_lock(&sequence->lock);
         list_del_init(&seqid->list);
+       if (seqid->task)
+               rpc_put_task(seqid->task);
         if (!list_empty(&sequence->list)) {
                 struct nfs_seqid *next;

                 next = list_first_entry(&sequence->list,
                                 struct nfs_seqid, list);
+               mdelay(100);
                 rpc_wake_up_queued_task(&sequence->wait, next->task);
         }
         spin_unlock(&sequence->lock);
@@ -1181,8 +1184,10 @@ int nfs_wait_on_sequence(struct nfs_seqid *seqid, struct rpc_task *task)
         sequence = seqid->sequence;
         spin_lock(&sequence->lock);
         seqid->task = task;
-       if (list_empty(&seqid->list))
+       if (list_empty(&seqid->list)) {
                 list_add_tail(&seqid->list, &sequence->list);
+               atomic_inc(&task->tk_count);
+       }
        if (list_first_entry(&sequence->list, struct nfs_seqid, list) == seqid)
                 goto unlock;
         rpc_sleep_on(&sequence->wait, task, NULL);







diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index b8ffbe52ba15..d2d7d84f0ed2 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2603,6 +2603,8 @@ static void nfs4_open_release(void *calldata)
          struct nfs4_opendata *data = calldata;
          struct nfs4_state *state = NULL;

+       if (data->cancelled || data->rpc_status != 0 || !data-
rpc_done)
+               nfs_release_seqid(data->o_arg.seqid);

          /* If this request hasn't been cancelled, do nothing */
          if (!data->cancelled)
                  goto out_free;


Looking forward to your reply!


See inlined comments above. Thanks!






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux