Re: [PATCH] mmc: core: Don't allocate IDA for OF aliases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 16 Apr 2021 at 19:50, Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote:
>
> Quoting Ulf Hansson (2021-04-16 00:17:10)
> > On Thu, 15 Apr 2021 at 21:29, Stephen Boyd <swboyd@xxxxxxxxxxxx> wrote:
> > >
> > >
> > > Don't think so. The device (with the kobject inside) is removed, and
> > > thus the mmc1 device will be removed, but the kobject's release function
> > > is delayed due to the config. This means that
> > > mmc_host_classdev_release() is called at a later time. The only thing
> > > inside that function is the IDA removal and the kfree of the container
> > > object. Given that nothing else is in that release function I believe it
> > > is safe to skip IDA allocation as it won't be blocking anything in the
> > > reserved alias case.
> > >
> > > Furthermore, when the device is deleted in mmc_remove_host() there could
> > > be other users of the device that haven't called put_device() yet.
> > > Either way, those other users are keeping the device memory alive, but
> > > otherwise device_del() has unlinked it from the various driver core
> > > lists and sysfs has removed it too so it's in a state where code may be
> > > referencing it but it's on the way out so users of the device will not
> > > be able to do much with it during this time.
> >
> > Right, but see more below.
> >
> > >
> > > This sort of problem (if it exists which I don't think it does) would
> > > have been there all along and can't be fixed at this level. When a
> > > device that has an alias calls the mmc_alloc_host() function twice it
> > > gets two different device structures created so there are two distinct
> > > kobjects that will need to be released at some point. The index is
> > > usually different for those two kobjects, but with aliases it turns out
> > > it is the same. When it comes to registering that device with the same
> > > name the second one will fail because a device with that name already
> > > exists on the bus. This would be really hard to do given that it would
> > > need to be the same aliased device in DT calling the mmc_add_host()
> > > function without calling mmc_remove_host() for the first one it added in
> > > between.
> >
> > In fact, we have a few rare corner cases that can trigger KASAN splats
> > when mmc_remove_host() gets executed. Similar splats can be triggered
> > by just doing a sudden card removal. It's especially related to the
> > cases, where a thread holds a reference to the card/host object when
> > it's being removed. I am working on patches to fix these cases, but
> > haven't yet decided on the best solution.
>
> Ok. Can you share the KASAN reports? The memory allocated for this class
> object and associated index from the IDA will be unique for each call
> the mmc_alloc_host() so I don't see how KASAN could be noticing
> something unless a reference to the host is leaking out without the
> proper get_device() call being made to keep the underlying memory from
> being freed.

Removing the host, also means removing the card. Although, as I said,
I need some more time to think of the best solution.

Here's a report, triggered with some manual hacks and by using the
mmc-utils usesland tool.

/mmc status get /dev/mmcblk1
[   95.905913] DEBUG: mmc_blk_open: Let's sleep for 10s..
[   98.806639] mmc1: card 0007 removed
[  105.972139] BUG: KASAN: use-after-free in mmc_blk_get+0x58/0xb8
[  105.979144] Read of size 4 at addr ffff00000a394a28 by task mmc/180
[  105.984945]
[  105.991209] CPU: 2 PID: 180 Comm: mmc Not tainted
5.10.0-rc4-00069-gcc758c8c7127-dirty #5
[  105.992943] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[  106.001010] Call trace:
[  106.007799]  dump_backtrace+0x0/0x2b4
[  106.009965]  show_stack+0x18/0x6c
[  106.013779]  dump_stack+0xfc/0x168
[  106.017083]  print_address_description.constprop.0+0x6c/0x488
[  106.020384]  kasan_report+0x118/0x210
[  106.026193]  __asan_load4+0x94/0xd0
[  106.029844]  mmc_blk_get+0x58/0xb8
[  106.033141]  mmc_blk_open+0x7c/0xdc
[  106.036614]  __blkdev_get+0x3b4/0x964
[  106.039996]  blkdev_get+0x64/0x100
[  106.043815]  blkdev_open+0xe8/0x104
[  106.047114]  do_dentry_open+0x234/0x61c
[  106.050498]  vfs_open+0x54/0x64
[  106.054324]  path_openat+0xe04/0x1584
[  106.057450]  do_filp_open+0xe8/0x1e4
[  106.061263]  do_sys_openat2+0x120/0x230
[  106.064911]  __arm64_sys_openat+0xf0/0x15c
[  106.068477]  el0_svc_common.constprop.0+0xac/0x234
[  106.072639]  do_el0_svc+0x84/0xa0
[  106.077410]  el0_sync_handler+0x264/0x270
[  106.080790]  el0_sync+0x174/0x180
[  106.084768]
[  106.088070] Allocated by task 33:
[  106.089647]  stack_trace_save+0x9c/0xdc
[  106.092858]  kasan_save_stack+0x28/0x60
[  106.096506]  __kasan_kmalloc.constprop.0+0xc8/0xf0
[  106.100325]  kasan_kmalloc+0x10/0x20
[  106.105183]  mmc_blk_alloc_req+0x94/0x4b0
[  106.108913]  mmc_blk_probe+0x2d4/0xaa4
[  106.112829]  mmc_bus_probe+0x34/0x4c
[  106.116471]  really_probe+0x148/0x6e0
[  106.120202]  driver_probe_device+0x78/0xec
[  106.123764]  __device_attach_driver+0x108/0x16c
[  106.127754]  bus_for_each_drv+0xf4/0x15c
[  106.132180]  __device_attach+0x168/0x240
[  106.136349]  device_initial_probe+0x14/0x20
[  106.140253]  bus_probe_device+0xec/0x100
[  106.144167]  device_add+0x55c/0xaf0
[  106.148332]  mmc_add_card+0x288/0x380
[  106.151540]  mmc_attach_sd+0x18c/0x22c
[  106.155363]  mmc_rescan+0x444/0x4f0
[  106.159014]  process_one_work+0x3b8/0x650
[  106.162396]  worker_thread+0xa0/0x724
[  106.166556]  kthread+0x218/0x220
[  106.170201]  ret_from_fork+0x10/0x38
[  106.173482]
[  106.177045] Freed by task 33:
[  106.178533]  stack_trace_save+0x9c/0xdc
[  106.181399]  kasan_save_stack+0x28/0x60
[  106.185045]  kasan_set_track+0x28/0x40
[  106.188868]  kasan_set_free_info+0x24/0x4c
[  106.192684]  __kasan_slab_free+0x100/0x180
[  106.196764]  kasan_slab_free+0x14/0x20
[  106.200838]  kfree+0xb8/0x46c
[  106.204583]  mmc_blk_put+0xe4/0x11c
[  106.207624]  mmc_blk_remove_req.part.0+0x6c/0xe4
[  106.210914]  mmc_blk_remove+0x368/0x370
[  106.215778]  mmc_bus_remove+0x34/0x50
[  106.219336]  __device_release_driver+0x228/0x31c
[  106.223155]  device_release_driver+0x2c/0x44
[  106.227840]  bus_remove_device+0x1e4/0x200
[  106.232100]  device_del+0x2b0/0x770
[  106.236005]  mmc_remove_card+0xf0/0x150
[  106.239383]  mmc_sd_detect+0x9c/0x150
[  106.243207]  mmc_rescan+0x110/0x4f0
[  106.247032]  process_one_work+0x3b8/0x650
[  106.250329]  worker_thread+0xa0/0x724
[  106.254488]  kthread+0x218/0x220
[  106.258134]  ret_from_fork+0x10/0x38
[  106.261416]
[  106.264986] The buggy address belongs to the object at ffff00000a394800
[  106.264986]  which belongs to the cache kmalloc-1k of size 1024
[  106.266485] The buggy address is located 552 bytes inside of
[  106.266485]  1024-byte region [ffff00000a394800, ffff00000a394c00)
[  106.278710] The buggy address belongs to the page:
[  106.290520] page:00000000ff84ed53 refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x8a390
[  106.295381] head:00000000ff84ed53 order:3 compound_mapcount:0
compound_pincount:0
[  106.304669] flags: 0x3fffc0000010200(slab|head)
[  106.312234] raw: 03fffc0000010200 dead000000000100 dead000000000122
ffff000009f03800
[  106.316571] raw: 0000000000000000 0000000000100010 00000001ffffffff
0000000000000000
[  106.324537] page dumped because: kasan: bad access detected
[  106.332254]
[  106.337543] Memory state around the buggy address:
[  106.339302]  ffff00000a394900: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[  106.343907]  ffff00000a394980: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[  106.351111] >ffff00000a394a00: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[  106.358303]                                   ^
[  106.365515]  ffff00000a394a80: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[  106.369948]  ffff00000a394b00: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[  106.377225] ==================================================================
[  106.384429] Disabling lock debugging due to kernel taint
open: No such device or address

>
> >
> > That's the reason why I was thinking that maybe returning
> > -EPROBE_DEFER could be an option, but certainly I need to think more
> > about this.
>
> I was hoping that you wouldn't need to think more about returning
> -EPROBE_DEFER after my email. :( Returning EPROBE_DEFER looks like it's
> a bandaid for bigger problems with reference counting the pointer
> allocated in mmc_alloc_host(). I hope I convinced you that there isn't
> any danger in reusing the IDA in the case of an alias because the only
> way that is a problem is if the same device calls mmc_alloc_host() twice
> without calling mmc_remove_host() in between. That should be a pretty
> obvious problem in driver code? The check to see if that same device has
> tried to alloc a host twice would still effectively happen after this
> patch because when mmc_add_host() tries to add that second device to the
> driver core it will complain about duplicate device names and fail.

You may very well be correct in you reasoning above, but I just need
to convince myself about it.

>
> Will you apply this patch?

It's likely, but allow me some more time to think about it. To make
sure we do the right thing.

Kind regards
Uffe



[Index of Archives]     [Linux Memonry Technology]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux