Re: [PATCH v2] poll: Fix use-after-free in poll_freewait()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Suren:

kernel config:
x86_64_defconfig
CONFIG_PSI=y
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB_DEBUG_ON=y
CONFIG_KASAN=y
CONFIG_KASAN_INLINE=y

I make some change in code, in order to increase the recurrence probability.
diff --git a/fs/select.c b/fs/select.c
index 5edffee1162c..5ee5b74a8386 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -139,6 +139,7 @@ void poll_freewait(struct poll_wqueues *pwq)
 {
        struct poll_table_page * p = pwq->table;
        int i;
+       mdelay(50);
        for (i = 0; i < pwq->inline_index; i++)
                free_poll_entry(pwq->inline_entries + i);
        while (p) {

Here is the simple repo test.sh:
#!/bin/bash

RESOURCE_TYPES=("cpu" "memory" "io" "irq")
#RESOURCE_TYPES=("cpu")
cgroup_num=50
test_dir=/sys/fs/cgroup/test

function restart_cgroup() {
        num=$(expr $RANDOM % $cgroup_num + 1)
        rmdir $test_dir/test_$num
        mkdir $test_dir/test_$num
}

function create_triggers() {
        num=$(expr $RANDOM % $cgroup_num + 1)
        random=$(expr $RANDOM % "${#RESOURCE_TYPES[@]}")
        psi_type="${RESOURCE_TYPES[${random}]}"
        ./psi_monitor $test_dir/test_$num $psi_type &
}

mkdir $test_dir
for i in $(seq 1 $cgroup_num)
do
        mkdir $test_dir/test_$i
done
for j in $(seq 1 100)
do
        restart_cgroup &
        create_triggers &
done

psi_monitor.c:
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <poll.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
        const char trig[] = "full 1000000 1000000";
        struct pollfd fds;
        char filename[100];

        sprintf(filename, "%s/%s.pressure", argv[1], argv[2]);

        fds.fd = open(filename, O_RDWR | O_NONBLOCK);
        if (fds.fd < 0) {
                printf("%s open error: %s\n", filename,strerror(errno));
                return 1;
        }
        fds.events = POLLPRI;
        if (write(fds.fd, trig, strlen(trig) + 1) < 0) {
                printf("%s write error: %s\n",filename,strerror(errno));
                return 1;
        }
        while (1) {
                poll(&fds, 1, -1);
        }
        close(fds.fd);
        return 0;
}
Thanks,
Lu
在 2023/6/16 7:13, Suren Baghdasaryan 写道:
On Wed, Jun 14, 2023 at 11:19 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:

On Wed, Jun 14, 2023 at 10:40 AM Eric Biggers <ebiggers@xxxxxxxxxx> wrote:

On Wed, Jun 14, 2023 at 03:07:33PM +0800, Lu Jialin wrote:
We found a UAF bug in remove_wait_queue as follows:

==================================================================
BUG: KASAN: use-after-free in _raw_spin_lock_irqsave+0x71/0xe0
Write of size 4 at addr ffff8881150d7b28 by task psi_trigger/15306
Call Trace:
  dump_stack+0x9c/0xd3
  print_address_description.constprop.0+0x19/0x170
  __kasan_report.cold+0x6c/0x84
  kasan_report+0x3a/0x50
  check_memory_region+0xfd/0x1f0
  _raw_spin_lock_irqsave+0x71/0xe0
  remove_wait_queue+0x26/0xc0
  poll_freewait+0x6b/0x120
  do_sys_poll+0x305/0x400
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x61/0xc6

Allocated by task 15306:
  kasan_save_stack+0x1b/0x40
  __kasan_kmalloc.constprop.0+0xb5/0xe0
  psi_trigger_create.part.0+0xfc/0x450
  cgroup_pressure_write+0xfc/0x3b0
  cgroup_file_write+0x1b3/0x390
  kernfs_fop_write_iter+0x224/0x2e0
  new_sync_write+0x2ac/0x3a0
  vfs_write+0x365/0x430
  ksys_write+0xd5/0x1b0
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x61/0xc6

Freed by task 15850:
  kasan_save_stack+0x1b/0x40
  kasan_set_track+0x1c/0x30
  kasan_set_free_info+0x20/0x40
  __kasan_slab_free+0x151/0x180
  kfree+0xba/0x680
  cgroup_file_release+0x5c/0xe0
  kernfs_drain_open_files+0x122/0x1e0
  kernfs_drain+0xff/0x1e0
  __kernfs_remove.part.0+0x1d1/0x3b0
  kernfs_remove_by_name_ns+0x89/0xf0
  cgroup_addrm_files+0x393/0x3d0
  css_clear_dir+0x8f/0x120
  kill_css+0x41/0xd0
  cgroup_destroy_locked+0x166/0x300
  cgroup_rmdir+0x37/0x140
  kernfs_iop_rmdir+0xbb/0xf0
  vfs_rmdir.part.0+0xa5/0x230
  do_rmdir+0x2e0/0x320
  __x64_sys_unlinkat+0x99/0xc0
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x61/0xc6
==================================================================

If using epoll(), wake_up_pollfree will empty waitqueue and set
wait_queue_head is NULL before free waitqueue of psi trigger. But is
doesn't work when using poll(), which will lead a UAF problem in
poll_freewait coms as following:

(cgroup_rmdir)                      |
psi_trigger_destroy                 |
   wake_up_pollfree(&t->event_wait)  |
    synchronize_rcu();               |
     kfree(t)                        |
                                   |   (poll_freewait)
                                   |     free_poll_entry(pwq->inline_entries + i)
                                   |       remove_wait_queue(entry->wait_address)
                                   |         spin_lock_irqsave(&wq_head->lock)

entry->wait_address in poll_freewait() is t->event_wait in cgroup_rmdir().
t->event_wait is free in psi_trigger_destroy before call poll_freewait(),
therefore wq_head in poll_freewait() has been already freed, which would
lead to a UAF.

Hi Lu,
Could you please share your reproducer along with the kernel config
you used? I'm trying to reproduce this UAF but every time I delete the
cgroup being polled, poll() simply returns POLLERR.
Thanks,
Suren.


similar problem for epoll() has been fixed commit c2dbe32d5db5
("sched/psi: Fix use-after-free in ep_remove_wait_queue()").
epoll wakeup function ep_poll_callback() will empty waitqueue and set
wait_queue_head is NULL when pollflags is POLLFREE and judge pwq->whead
is NULL or not before remove_wait_queue in ep_remove_wait_queue(),
which will fix the UAF bug in ep_remove_wait_queue.

But poll wakeup function pollwake() doesn't do that. To fix the
problem, we empty waitqueue and set wait_address is NULL in pollwake() when
key is POLLFREE. otherwise in remove_wait_queue, which is similar to
epoll().

Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Suggested-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>
Link: https://lore.kernel.org/all/CAJuCfpEoCRHkJF-=1Go9E94wchB4BzwQ1E3vHGWxNe+tEmSJoA@xxxxxxxxxxxxxx/#t
Signed-off-by: Lu Jialin <lujialin4@xxxxxxxxxx>
---
v2: correct commit msg and title suggested by Suren Baghdasaryan
---
  fs/select.c | 20 +++++++++++++++++++-
  1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/fs/select.c b/fs/select.c
index 0ee55af1a55c..e64c7b4e9959 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -132,7 +132,17 @@ EXPORT_SYMBOL(poll_initwait);

  static void free_poll_entry(struct poll_table_entry *entry)
  {
-     remove_wait_queue(entry->wait_address, &entry->wait);
+     wait_queue_head_t *whead;
+
+     rcu_read_lock();
+     /* If it is cleared by POLLFREE, it should be rcu-safe.
+      * If we read NULL we need a barrier paired with smp_store_release()
+      * in pollwake().
+      */
+     whead = smp_load_acquire(&entry->wait_address);
+     if (whead)
+             remove_wait_queue(whead, &entry->wait);
+     rcu_read_unlock();
       fput(entry->filp);
  }

@@ -215,6 +225,14 @@ static int pollwake(wait_queue_entry_t *wait, unsigned mode, int sync, void *key
       entry = container_of(wait, struct poll_table_entry, wait);
       if (key && !(key_to_poll(key) & entry->key))
               return 0;
+     if (key_to_poll(key) & POLLFREE) {
+             list_del_init(&wait->entry);
+             /* wait_address !=NULL protects us from the race with
+              * poll_freewait().
+              */
+             smp_store_release(&entry->wait_address, NULL);
+             return 0;
+     }
       return __pollwake(wait, mode, sync, key);

I don't understand why this patch is needed.

The last time I looked at POLLFREE, it is only needed because of asynchronous
polls.  See my explanation in the commit message of commit 50252e4b5e989ce6.

Ah, I missed that. Thanks for the correction.


In summary, POLLFREE solves the problem of polled waitqueues whose lifetime is
tied to the current task rather than to the file being polled.  Also refer to
the comment above wake_up_pollfree(), which mentions this.

fs/select.c is synchronous polling, not asynchronous.  Therefore, it should not
need to handle POLLFREE.

If there's actually a bug here, most likely it's a bug in psi_trigger_poll()
where it is using a waitqueue whose lifetime is tied to neither the current task
nor the file being polled.  That needs to be fixed.

Yeah. We discussed this issue in
https://lore.kernel.org/all/CAJuCfpFb0J5ZwO6kncjRG0_4jQLXUy-_dicpH5uGiWP8aKYEJQ@xxxxxxxxxxxxxx
and the root cause is that cgroup_file_release() where
psi_trigger_destroy() is called is not tied to the cgroup file's real
lifetime (see my analysis here:
https://lore.kernel.org/all/CAJuCfpFZ3B4530TgsSHqp5F_gwfrDujwRYewKReJru==MdEHQg@xxxxxxxxxxxxxx/#t).
I guess it's time to do a deeper surgery and figure out a way to call
psi_trigger_destroy() when the polled cgroup file is actually being
destroyed. I'll take a closer look into this later today.
A fix will likely require some cgroup or kernfs code changes, so
CC'ing Tejun for visibility.
Thanks,
Suren.


- Eric
.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux