Re: [PATCH v2] writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023/5/3 8:17, Andrew Morton wrote:
On Mon, 10 Apr 2023 21:08:26 +0800 Baokun Li <libaokun1@xxxxxxxxxx> wrote:

KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
  <TASK>
  dump_stack_lvl+0x7f/0xc0
  print_report+0x2ba/0x340
  kasan_report+0xc4/0x120
  kasan_check_range+0x1b7/0x2e0
  __kasan_check_write+0x24/0x40
  bdi_split_work_to_wbs+0x5c5/0x7b0
  sync_inodes_sb+0x195/0x630
  sync_inodes_one_sb+0x3a/0x50
  iterate_supers+0x106/0x1b0
  ksys_sync+0x98/0x160
[...]
==================================================================

The race that causes the above issue is as follows:

            cpu1                     cpu2
-------------------------|-------------------------
inode_switch_wbs
  INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
  queue_rcu_work(isw_wq, &isw->work)
  // queue_work async
   inode_switch_wbs_work_fn
    wb_put_many(old_wb, nr_switched)
     percpu_ref_put_many
      ref->data->release(ref)
      cgwb_release
       queue_work(cgwb_release_wq, &wb->release_work)
       // queue_work async
        &wb->release_work
        cgwb_release_workfn
                             ksys_sync
                              iterate_supers
                               sync_inodes_one_sb
                                sync_inodes_sb
                                 bdi_split_work_to_wbs
                                  kmalloc(sizeof(*work), GFP_ATOMIC)
                                  // alloc memory failed
         percpu_ref_exit
          ref->data = NULL
          kfree(data)
                                  wb_get(wb)
                                   percpu_ref_get(&wb->refcnt)
                                    percpu_ref_get_many(ref, 1)
                                     atomic_long_add(nr, &ref->data->count)
                                      atomic64_add(i, v)
                                      // trigger null-ptr-deref

bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all wbs.
If the allocation of new work fails, the on-stack fallback will be used and
the reference count of the current wb is increased afterwards. If cgroup
writeback membership switches occur before getting the reference count and
the current wb is released as old_wd, then calling wb_get() or wb_put()
will trigger the null pointer dereference above.

This issue was introduced in v4.3-rc7 (see fix tag1). Both sync_inodes_sb()
and __writeback_inodes_sb_nr() calls to bdi_split_work_to_wbs() can trigger
this issue. For scenarios called via sync_inodes_sb(), originally commit
7fc5854f8c6e ("writeback: synchronize sync(2) against cgroup writeback
membership switches") reduced the possibility of the issue by adding
wb_switch_rwsem, but in v5.14-rc1 (see fix tag2) removed the
"inode_io_list_del_locked(inode, old_wb)" from inode_switch_wbs_work_fn()
so that wb->state contains WB_has_dirty_io, thus old_wb is not skipped
when traversing wbs in bdi_split_work_to_wbs(), and the issue becomes
easily reproducible again.

To solve this problem, percpu_ref_exit() is called under RCU protection
to avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.

Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Fixes: f3b6a6df38aa ("writeback, cgroup: keep list of inodes attached to bdi_writeback")
Cc Roman for this second commit.
Thanks! I forgot to cc to Roman!

Cc: stable@xxxxxxxxxxxxxxx
Having two Fixes: is awkward.  These serve as a guide to tell -stable
maintainers which kernels need the fix.  Can we be more precise?
Yes! I'm very sorry about that. As I replied earlier, this issue was introduced in v4.3-rc7.
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -978,6 +978,16 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
  			continue;
  		}
+ /*
+		 * If wb_tryget fails, the wb has been shutdown, skip it.
+		 *
+		 * Pin @wb so that it stays on @bdi->wb_list.  This allows
+		 * continuing iteration from @wb after dropping and
+		 * regrabbing rcu read lock.
+		 */
+		if (!wb_tryget(wb))
+			continue;
+
  		/* alloc failed, execute synchronously using on-stack fallback */
  		work = &fallback_work;
  		*work = *base_work;
@@ -986,13 +996,6 @@ static void bdi_split_work_to_wbs(struct backing_dev_info *bdi,
  		work->done = &fallback_work_done;
wb_queue_work(wb, work);
-
-		/*
-		 * Pin @wb so that it stays on @bdi->wb_list.  This allows
-		 * continuing iteration from @wb after dropping and
-		 * regrabbing rcu read lock.
-		 */
-		wb_get(wb);
  		last_wb = wb;
rcu_read_unlock();
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index ad011308cebe..43b48750b491 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -507,6 +507,15 @@ static LIST_HEAD(offline_cgwbs);
  static void cleanup_offline_cgwbs_workfn(struct work_struct *work);
  static DECLARE_WORK(cleanup_offline_cgwbs_work, cleanup_offline_cgwbs_workfn);
+static void cgwb_free_rcu(struct rcu_head *rcu_head)
+{
+	struct bdi_writeback *wb = container_of(rcu_head,
+			struct bdi_writeback, rcu);
nit:

	struct bdi_writeback *wb;

	wb = container_of(rcu_head, struct bdi_writeback, rcu);

looks nicer, no?

Yes, it looks neater.


+	percpu_ref_exit(&wb->refcnt);
+	kfree(wb);
+}
+
  static void cgwb_release_workfn(struct work_struct *work)
  {
  	struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
@@ -529,11 +538,10 @@ static void cgwb_release_workfn(struct work_struct *work)
  	list_del(&wb->offline_node);
  	spin_unlock_irq(&cgwb_lock);
- percpu_ref_exit(&wb->refcnt);
  	wb_exit(wb);
  	bdi_put(bdi);
  	WARN_ON_ONCE(!list_empty(&wb->b_attached));
-	kfree_rcu(wb, rcu);
+	call_rcu(&wb->rcu, cgwb_free_rcu);
  }
static void cgwb_release(struct percpu_ref *refcnt)


Thanks!
--
With Best Regards,
Baokun Li
.



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux