On Tue, 13 Dec 2022 at 15:17, Yang Yingliang <yangyingliang@xxxxxxxxxx> wrote: > > If sdio_add_func() or sdio_init_func() fails, sdio_remove_func() can > not release the resources, because the sdio function is not presented > in these two cases, it won't call of_node_put() or put_device(). > > To fix these leaks, make sdio_func_present() only control whether > device_del() needs to be called or not, then always call of_node_put() > and put_device(). > > In error case in sdio_init_func(), the reference of 'card->dev' is > not get, to avoid redundant put in sdio_free_func_cis(), move the > get_device() to sdio_alloc_func() and put_device() to sdio_release_func(), > it can keep the get/put function balanced. > > Without this patch, while doing fault inject test, it can get the > following leak reports, after this fix, the leak is gone. > > unreferenced object 0xffff888112514000 (size 2048): > comm "kworker/3:2", pid 65, jiffies 4294741614 (age 124.774s) > hex dump (first 32 bytes): > 00 e0 6f 12 81 88 ff ff 60 58 8d 06 81 88 ff ff ..o.....`X...... > 10 40 51 12 81 88 ff ff 10 40 51 12 81 88 ff ff .@Q......@Q..... > backtrace: > [<000000009e5931da>] kmalloc_trace+0x21/0x110 > [<000000002f839ccb>] mmc_alloc_card+0x38/0xb0 [mmc_core] > [<0000000004adcbf6>] mmc_sdio_init_card+0xde/0x170 [mmc_core] > [<000000007538fea0>] mmc_attach_sdio+0xcb/0x1b0 [mmc_core] > [<00000000d4fdeba7>] mmc_rescan+0x54a/0x640 [mmc_core] > > unreferenced object 0xffff888112511000 (size 2048): > comm "kworker/3:2", pid 65, jiffies 4294741623 (age 124.766s) > hex dump (first 32 bytes): > 00 40 51 12 81 88 ff ff e0 58 8d 06 81 88 ff ff .@Q......X...... > 10 10 51 12 81 88 ff ff 10 10 51 12 81 88 ff ff ..Q.......Q..... > backtrace: > [<000000009e5931da>] kmalloc_trace+0x21/0x110 > [<00000000fcbe706c>] sdio_alloc_func+0x35/0x100 [mmc_core] > [<00000000c68f4b50>] mmc_attach_sdio.cold.18+0xb1/0x395 [mmc_core] > [<00000000d4fdeba7>] mmc_rescan+0x54a/0x640 [mmc_core] > Thanks for the detailed description, nice! > Fixes: 25185f3f31c9 ("mmc: Add SDIO function devicetree subnode parsing") This looks wrong, it's not really that commit that introduces the problem. It existed way before this. > Fixes: 3d10a1ba0d37 ("sdio: fix reference counting in sdio_remove_func()") Even if the problem is really old, I am worried that we may introduce other problems if $subject patch gets applied as is, to older stable kernels that carry the above commit. Did you have a look at this too? Unless you propose something, I will try to figure out the best way forward here. > Signed-off-by: Yang Yingliang <yangyingliang@xxxxxxxxxx> > --- > v4 -> v5: > Merge to two pathes in one and add leak reports. > Fix wrong check in sdio_remove_func(). > Move get/put_device() in sdio_alloc/release_func. > > v3 -> v4: > Drop patch1, keep calling put_device() to free memory, > set 'func->card' to NULL to avoid redundant put. > > v2 -> v3: > Change to call of_node_put() in remove() function to > fix node refcount leak. > > v1 -> v2: > Fix compile error in patch #2. > --- > drivers/mmc/core/sdio_bus.c | 20 ++++++++++++++++---- > drivers/mmc/core/sdio_cis.c | 12 ------------ > 2 files changed, 16 insertions(+), 16 deletions(-) > > diff --git a/drivers/mmc/core/sdio_bus.c b/drivers/mmc/core/sdio_bus.c > index babf21a0adeb..1ba135cd4caa 100644 > --- a/drivers/mmc/core/sdio_bus.c > +++ b/drivers/mmc/core/sdio_bus.c > @@ -291,8 +291,14 @@ static void sdio_release_func(struct device *dev) > { > struct sdio_func *func = dev_to_sdio_func(dev); > > - if (!(func->card->quirks & MMC_QUIRK_NONSTD_SDIO)) > + if (!(func->card->quirks & MMC_QUIRK_NONSTD_SDIO)) { > sdio_free_func_cis(func); > + /* > + * We have now removed the link to the tuples in the > + * card structure, so remove the reference. > + */ > + put_device(&func->card->dev); In fact, we are relying on the card's struct device itself, even if the MMC_QUIRK_NONSTD_SDIO is set or not. More importantly, we are relying on the "card" before device_add() (that helps us to manage parent/child relationships) is getting called for the sdio_func's struct device. Therefore, I think we should call put_device() here, no matter whether MMC_QUIRK_NONSTD_SDIO is set or not. > + } > > kfree(func->info); > kfree(func->tmpbuf); > @@ -324,6 +330,13 @@ struct sdio_func *sdio_alloc_func(struct mmc_card *card) > > device_initialize(&func->dev); > > + /* > + * We may link to tuples in the card structure, > + * we need make sure we have a reference to it. > + */ > + if (!(func->card->quirks & MMC_QUIRK_NONSTD_SDIO)) For the similar reasons as above, we should call get_device() here, no matter whether MMC_QUIRK_NONSTD_SDIO is set or not. > + get_device(&func->card->dev); > + > func->dev.parent = &card->dev; > func->dev.bus = &sdio_bus_type; > func->dev.release = sdio_release_func; > @@ -377,10 +390,9 @@ int sdio_add_func(struct sdio_func *func) > */ > void sdio_remove_func(struct sdio_func *func) > { > - if (!sdio_func_present(func)) > - return; > + if (sdio_func_present(func)) > + device_del(&func->dev); > > - device_del(&func->dev); > of_node_put(func->dev.of_node); > put_device(&func->dev); > } > diff --git a/drivers/mmc/core/sdio_cis.c b/drivers/mmc/core/sdio_cis.c > index a705ba6eff5b..afaa6cab1adc 100644 > --- a/drivers/mmc/core/sdio_cis.c > +++ b/drivers/mmc/core/sdio_cis.c > @@ -403,12 +403,6 @@ int sdio_read_func_cis(struct sdio_func *func) > if (ret) > return ret; > > - /* > - * Since we've linked to tuples in the card structure, > - * we must make sure we have a reference to it. > - */ > - get_device(&func->card->dev); > - > /* > * Vendor/device id is optional for function CIS, so > * copy it from the card structure as needed. > @@ -434,11 +428,5 @@ void sdio_free_func_cis(struct sdio_func *func) > } > > func->tuples = NULL; > - > - /* > - * We have now removed the link to the tuples in the > - * card structure, so remove the reference. > - */ > - put_device(&func->card->dev); > } > Kind regards Uffe