On Mon, Oct 10 2022 at 10:39P -0400, Luo Meng <luomeng@xxxxxxxxxxxxxxx> wrote: > From: Luo Meng <luomeng12@xxxxxxxxxx> > > When dm_resume() and dm_destroy() are concurrent, it will > lead to UAF. > > One of the concurrency UAF can be shown as below: > > use free > do_resume | > __find_device_hash_cell | > dm_get | > atomic_inc(&md->holders) | > | dm_destroy > | __dm_destroy > | if (!dm_suspended_md(md)) > | atomic_read(&md->holders) > | msleep(1) > dm_resume | > __dm_resume | > dm_table_resume_targets | > pool_resume | > do_waker #add delay work | > | dm_table_destroy > | pool_dtr > | __pool_dec > | __pool_destroy > | destroy_workqueue > | kfree(pool) # free pool > time out > __do_softirq > run_timer_softirq # pool has already been freed > > This can be easily reproduced using: > 1. create thin-pool > 2. dmsetup suspend pool > 3. dmsetup resume pool > 4. dmsetup remove_all # Concurrent with 3 > > The root cause of UAF bugs is that dm_resume() adds timer after > dm_destroy() skips cancel timer beause of suspend status. After > timeout, it will call run_timer_softirq(), however pool has already > been freed. The concurrency UAF bug will happen. > > Therefore, canceling timer is moved after md->holders is zero. > > Signed-off-by: Luo Meng <luomeng12@xxxxxxxxxx> > --- > drivers/md/dm.c | 26 +++++++++++++------------- > 1 file changed, 13 insertions(+), 13 deletions(-) > > diff --git a/drivers/md/dm.c b/drivers/md/dm.c > index 60549b65c799..379525313628 100644 > --- a/drivers/md/dm.c > +++ b/drivers/md/dm.c > @@ -2420,6 +2420,19 @@ static void __dm_destroy(struct mapped_device *md, bool wait) > > blk_mark_disk_dead(md->disk); > > + /* > + * Rare, but there may be I/O requests still going to complete, > + * for example. Wait for all references to disappear. > + * No one should increment the reference count of the mapped_device, > + * after the mapped_device state becomes DMF_FREEING. > + */ > + if (wait) > + while (atomic_read(&md->holders)) > + msleep(1); > + else if (atomic_read(&md->holders)) > + DMWARN("%s: Forcibly removing mapped_device still in use! (%d users)", > + dm_device_name(md), atomic_read(&md->holders)); > + > /* > * Take suspend_lock so that presuspend and postsuspend methods > * do not race with internal suspend. > @@ -2436,19 +2449,6 @@ static void __dm_destroy(struct mapped_device *md, bool wait) > dm_put_live_table(md, srcu_idx); > mutex_unlock(&md->suspend_lock); > > - /* > - * Rare, but there may be I/O requests still going to complete, > - * for example. Wait for all references to disappear. > - * No one should increment the reference count of the mapped_device, > - * after the mapped_device state becomes DMF_FREEING. > - */ > - if (wait) > - while (atomic_read(&md->holders)) > - msleep(1); > - else if (atomic_read(&md->holders)) > - DMWARN("%s: Forcibly removing mapped_device still in use! (%d users)", > - dm_device_name(md), atomic_read(&md->holders)); > - > dm_table_destroy(__unbind(md)); > free_dev(md); > } > -- > 2.31.1 > Thanks for the report but your fix seems wrong. A thin-pool specific fix seems much more appropriate. Does this fix the issue? diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index e76c96c760a9..dc271c107fb5 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -2889,6 +2889,8 @@ static void __pool_destroy(struct pool *pool) dm_bio_prison_destroy(pool->prison); dm_kcopyd_client_destroy(pool->copier); + cancel_delayed_work_sync(&pool->waker); + cancel_delayed_work_sync(&pool->no_space_timeout); if (pool->wq) destroy_workqueue(pool->wq); -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel