Quoting Ulf Hansson (2021-04-16 16:07:11) > On Fri, 16 Apr 2021 at 19:50, Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote: > > > > Ok. Can you share the KASAN reports? The memory allocated for this class > > object and associated index from the IDA will be unique for each call > > the mmc_alloc_host() so I don't see how KASAN could be noticing > > something unless a reference to the host is leaking out without the > > proper get_device() call being made to keep the underlying memory from > > being freed. > > Removing the host, also means removing the card. Although, as I said, > I need some more time to think of the best solution. > > Here's a report, triggered with some manual hacks and by using the > mmc-utils usesland tool. Thanks! I'm just getting back to this mail on a Friday afternoon! > > /mmc status get /dev/mmcblk1 > [ 95.905913] DEBUG: mmc_blk_open: Let's sleep for 10s.. > [ 98.806639] mmc1: card 0007 removed > [ 105.972139] BUG: KASAN: use-after-free in mmc_blk_get+0x58/0xb8 > [ 105.979144] Read of size 4 at addr ffff00000a394a28 by task mmc/180 > [ 105.984945] > [ 105.991209] CPU: 2 PID: 180 Comm: mmc Not tainted > 5.10.0-rc4-00069-gcc758c8c7127-dirty #5 > [ 105.992943] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > [ 106.001010] Call trace: > [ 106.007799] dump_backtrace+0x0/0x2b4 > [ 106.009965] show_stack+0x18/0x6c > [ 106.013779] dump_stack+0xfc/0x168 > [ 106.017083] print_address_description.constprop.0+0x6c/0x488 > [ 106.020384] kasan_report+0x118/0x210 > [ 106.026193] __asan_load4+0x94/0xd0 > [ 106.029844] mmc_blk_get+0x58/0xb8 > [ 106.033141] mmc_blk_open+0x7c/0xdc > [ 106.036614] __blkdev_get+0x3b4/0x964 > [ 106.039996] blkdev_get+0x64/0x100 > [ 106.043815] blkdev_open+0xe8/0x104 > [ 106.047114] do_dentry_open+0x234/0x61c > [ 106.050498] vfs_open+0x54/0x64 > [ 106.054324] path_openat+0xe04/0x1584 > [ 106.057450] do_filp_open+0xe8/0x1e4 > [ 106.061263] do_sys_openat2+0x120/0x230 > [ 106.064911] __arm64_sys_openat+0xf0/0x15c > [ 106.068477] el0_svc_common.constprop.0+0xac/0x234 > [ 106.072639] do_el0_svc+0x84/0xa0 > [ 106.077410] el0_sync_handler+0x264/0x270 > [ 106.080790] el0_sync+0x174/0x180 > [ 106.084768] > [ 106.088070] Allocated by task 33: > [ 106.089647] stack_trace_save+0x9c/0xdc > [ 106.092858] kasan_save_stack+0x28/0x60 > [ 106.096506] __kasan_kmalloc.constprop.0+0xc8/0xf0 > [ 106.100325] kasan_kmalloc+0x10/0x20 > [ 106.105183] mmc_blk_alloc_req+0x94/0x4b0 I see that this is a different IDA managed object, for mmc_blk_ida. Presumably the object allocated is md? md = kzalloc(sizeof(struct mmc_blk_data), GFP_KERNEL); Is that the line? > [ 106.108913] mmc_blk_probe+0x2d4/0xaa4 > [ 106.112829] mmc_bus_probe+0x34/0x4c > [ 106.116471] really_probe+0x148/0x6e0 > [ 106.120202] driver_probe_device+0x78/0xec > [ 106.123764] __device_attach_driver+0x108/0x16c > [ 106.127754] bus_for_each_drv+0xf4/0x15c > [ 106.132180] __device_attach+0x168/0x240 > [ 106.136349] device_initial_probe+0x14/0x20 > [ 106.140253] bus_probe_device+0xec/0x100 > [ 106.144167] device_add+0x55c/0xaf0 > [ 106.148332] mmc_add_card+0x288/0x380 > [ 106.151540] mmc_attach_sd+0x18c/0x22c > [ 106.155363] mmc_rescan+0x444/0x4f0 > [ 106.159014] process_one_work+0x3b8/0x650 > [ 106.162396] worker_thread+0xa0/0x724 > [ 106.166556] kthread+0x218/0x220 > [ 106.170201] ret_from_fork+0x10/0x38 > [ 106.173482] > [ 106.177045] Freed by task 33: > [ 106.178533] stack_trace_save+0x9c/0xdc > [ 106.181399] kasan_save_stack+0x28/0x60 > [ 106.185045] kasan_set_track+0x28/0x40 > [ 106.188868] kasan_set_free_info+0x24/0x4c > [ 106.192684] __kasan_slab_free+0x100/0x180 > [ 106.196764] kasan_slab_free+0x14/0x20 > [ 106.200838] kfree+0xb8/0x46c > [ 106.204583] mmc_blk_put+0xe4/0x11c Looking at mmc_blk_put() and mmc_blk_get() I see static struct mmc_blk_data *mmc_blk_get(struct gendisk *disk) { ... md = disk->private_data; if (md && md->usage == 0) md = NULL; ... } static void mmc_blk_put(struct mmc_blk_data *md) { ... if (md->usage == 0) { int devidx = mmc_get_devidx(md->disk); blk_put_queue(md->queue.queue); ida_simple_remove(&mmc_blk_ida, devidx); put_disk(md->disk); kfree(md); } ... } and notice that mmc_blk_get() takes a gendisk but mmc_blk_put() takes a mmc_blk_data. That's already weird, but then I notice that 'md' comes from disk->private_data in mmc_blk_get() and in mmc_blk_put() we kfree 'md' if usage drops to 0. The storage for md is inside disk->private_data, according to mmc_blk_alloc_req() md->disk->private_data = md so 'md' points to itself through gendisk private_data. Alright. Either way, KASAN is telling us that 'md' got freed in mmc_blk_put() but the gendisk is still around and valid, because it's held alive via some kobject_get(). When we go to blkdev_open() we have that gendisk that has private_data pointing to the freed 'md'. This is a classic dangling pointer bug. Given that mmc_blk_get() is checking for disk->private_data being non-NULL, I looked to see where that is assigned to NULL, but I don't see it. Is it ever set to NULL? We could set private_data to NULL after freeing it, but that feels hacky. The md->usage code looks like a kref design open-coded; probably replace that with a kref that does the code in the if (md->usage == 0) path on the kref release function and then we wouldn't need a mutex around these get/put APIs. Does this patch fix it? ---8<---- diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index d666e24fbe0e..c7939e8fe76f 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -203,6 +203,7 @@ static void mmc_blk_put(struct mmc_blk_data *md) int devidx = mmc_get_devidx(md->disk); blk_put_queue(md->queue.queue); ida_simple_remove(&mmc_blk_ida, devidx); + md->disk->private_data = NULL; put_disk(md->disk); kfree(md); } > [ 106.207624] mmc_blk_remove_req.part.0+0x6c/0xe4 > [ 106.210914] mmc_blk_remove+0x368/0x370 > [ 106.215778] mmc_bus_remove+0x34/0x50 > [ 106.219336] __device_release_driver+0x228/0x31c > [ 106.223155] device_release_driver+0x2c/0x44 > [ 106.227840] bus_remove_device+0x1e4/0x200 > [ 106.232100] device_del+0x2b0/0x770 > [ 106.236005] mmc_remove_card+0xf0/0x150 > [ 106.239383] mmc_sd_detect+0x9c/0x150 > [ 106.243207] mmc_rescan+0x110/0x4f0 > [ 106.247032] process_one_work+0x3b8/0x650 > [ 106.250329] worker_thread+0xa0/0x724 > [ 106.254488] kthread+0x218/0x220 > [ 106.258134] ret_from_fork+0x10/0x38 > [ 106.261416] > [ 106.264986] The buggy address belongs to the object at ffff00000a394800 > [ 106.264986] which belongs to the cache kmalloc-1k of size 1024 > [ 106.266485] The buggy address is located 552 bytes inside of > [ 106.266485] 1024-byte region [ffff00000a394800, ffff00000a394c00) > [ 106.278710] The buggy address belongs to the page: > [ 106.290520] page:00000000ff84ed53 refcount:1 mapcount:0 > mapping:0000000000000000 index:0x0 pfn:0x8a390 > [ 106.295381] head:00000000ff84ed53 order:3 compound_mapcount:0 > compound_pincount:0 > [ 106.304669] flags: 0x3fffc0000010200(slab|head) > [ 106.312234] raw: 03fffc0000010200 dead000000000100 dead000000000122 > ffff000009f03800 > [ 106.316571] raw: 0000000000000000 0000000000100010 00000001ffffffff > 0000000000000000 > [ 106.324537] page dumped because: kasan: bad access detected > [ 106.332254] > [ 106.337543] Memory state around the buggy address: > [ 106.339302] ffff00000a394900: fb fb fb fb fb fb fb fb fb fb fb fb > fb fb fb fb > [ 106.343907] ffff00000a394980: fb fb fb fb fb fb fb fb fb fb fb fb > fb fb fb fb > [ 106.351111] >ffff00000a394a00: fb fb fb fb fb fb fb fb fb fb fb fb > fb fb fb fb > [ 106.358303] ^ > [ 106.365515] ffff00000a394a80: fb fb fb fb fb fb fb fb fb fb fb fb > fb fb fb fb > [ 106.369948] ffff00000a394b00: fb fb fb fb fb fb fb fb fb fb fb fb > fb fb fb fb If you're interested in the kref patch it may also clean it up a bit. I left the mutex in place, but otherwise now the refcount is an atomic variable, so mmc_blk_put() can happen in parallel to mmc_blk_get() and the release part can run later. This lets mmc_blk_get() start failing earlier and returning NULL when the last user has called mmc_blk_put() but the disk is still hanging around. No idea if this works though. ----8<---- diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index d666e24fbe0e..8a300cc2c8be 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -27,6 +27,7 @@ #include <linux/errno.h> #include <linux/hdreg.h> #include <linux/kdev_t.h> +#include <linux/kref.h> #include <linux/blkdev.h> #include <linux/cdev.h> #include <linux/mutex.h> @@ -110,7 +111,7 @@ struct mmc_blk_data { #define MMC_BLK_CMD23 (1 << 0) /* Can do SET_BLOCK_COUNT for multiblock */ #define MMC_BLK_REL_WR (1 << 1) /* MMC Reliable write support */ - unsigned int usage; + struct kref kref; unsigned int read_only; unsigned int part_type; unsigned int reset_done; @@ -180,10 +181,8 @@ static struct mmc_blk_data *mmc_blk_get(struct gendisk *disk) mutex_lock(&open_lock); md = disk->private_data; - if (md && md->usage == 0) + if (md && !kref_get_unless_zero(&md->kref)) md = NULL; - if (md) - md->usage++; mutex_unlock(&open_lock); return md; @@ -195,20 +194,26 @@ static inline int mmc_get_devidx(struct gendisk *disk) return devidx; } -static void mmc_blk_put(struct mmc_blk_data *md) +static void mmc_blk_kref_release(struct kref *ref) { + struct mmc_blk_data *md = container_of(ref, struct mmc_blk_data, kref); + int devidx; + mutex_lock(&open_lock); - md->usage--; - if (md->usage == 0) { - int devidx = mmc_get_devidx(md->disk); - blk_put_queue(md->queue.queue); - ida_simple_remove(&mmc_blk_ida, devidx); - put_disk(md->disk); - kfree(md); - } + devidx = mmc_get_devidx(md->disk); + blk_put_queue(md->queue.queue); + ida_simple_remove(&mmc_blk_ida, devidx); + md->disk->private_data = NULL; + put_disk(md->disk); + kfree(md); mutex_unlock(&open_lock); } +static void mmc_blk_put(struct mmc_blk_data *md) +{ + kref_put(&md->kref, mmc_blk_kref_release); +} + static ssize_t power_ro_lock_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -2300,7 +2305,7 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card, INIT_LIST_HEAD(&md->part); INIT_LIST_HEAD(&md->rpmbs); - md->usage = 1; + kref_init(&md->kref); ret = mmc_init_queue(&md->queue, card); if (ret)