On Wed, Nov 17, 2021 at 05:29:07PM -0600, Bjorn Andersson wrote: > On Tue 02 Nov 18:51 CDT 2021, Sujit Kautkar wrote: > > I like Stephen's suggestion about modifying the $subject. > Also note that the change isn't in the glink driver, so prefix should > reflect that: > > $ git log --oneline --no-decorate -- drivers/rpmsg/rpmsg_char.c > f998d48f9b3c rpmsg: glink: Update cdev add/del API in rpmsg_ctrldev_release_device() > bc774a3887cb rpmsg: char: Remove useless include > 964e8bedd5a1 rpmsg: char: Return an error if device already open > ... > > > Replace cdev add/del APIs with cdev_device_add/cdev_device_del to avoid > > below kernel warning. This correctly takes a reference to the parent > > device so the parent will not get released until all references to the > > cdev are released. > > > > | ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x7c > > | WARNING: CPU: 7 PID: 19892 at lib/debugobjects.c:488 debug_print_object+0x13c/0x1b0 > > | CPU: 7 PID: 19892 Comm: kworker/7:4 Tainted: G W 5.4.147-lockdep #1 > > | ================================================================== > > | Hardware name: Google CoachZ (rev1 - 2) with LTE (DT) > > | Workqueue: events kobject_delayed_cleanup > > | pstate: 60c00009 (nZCv daif +PAN +UAO) > > | pc : debug_print_object+0x13c/0x1b0 > > | lr : debug_print_object+0x13c/0x1b0 > > | sp : ffffff83b2ec7970 > > | x29: ffffff83b2ec7970 x28: dfffffd000000000 > > | x27: ffffff83d674f000 x26: dfffffd000000000 > > | x25: ffffffd06b8fa660 x24: dfffffd000000000 > > | x23: 0000000000000000 x22: ffffffd06b7c5108 > > | x21: ffffffd06d597860 x20: ffffffd06e2c21c0 > > | x19: ffffffd06d5974c0 x18: 000000000001dad8 > > | x17: 0000000000000000 x16: dfffffd000000000 > > | BUG: KASAN: use-after-free in qcom_glink_rpdev_release+0x54/0x70 > > | x15: ffffffffffffffff x14: 79616c6564203a74 > > | x13: 0000000000000000 x12: 0000000000000080 > > | Write of size 8 at addr ffffff83d95768d0 by task kworker/3:1/150 > > | x11: 0000000000000001 x10: 0000000000000000 > > | x9 : fc9e8edec0ad0300 x8 : fc9e8edec0ad0300 > > | > > | x7 : 0000000000000000 x6 : 0000000000000000 > > | x5 : 0000000000000080 x4 : 0000000000000000 > > | CPU: 3 PID: 150 Comm: kworker/3:1 Tainted: G W 5.4.147-lockdep #1 > > | x3 : ffffffd06c149574 x2 : ffffff83f77f7498 > > | x1 : ffffffd06d596f60 x0 : 0000000000000061 > > | Hardware name: Google CoachZ (rev1 - 2) with LTE (DT) > > | Call trace: > > | debug_print_object+0x13c/0x1b0 > > | Workqueue: events kobject_delayed_cleanup > > | __debug_check_no_obj_freed+0x25c/0x3c0 > > | debug_check_no_obj_freed+0x18/0x20 > > | Call trace: > > | slab_free_freelist_hook+0xb4/0x1bc > > | kfree+0xe8/0x2d8 > > | dump_backtrace+0x0/0x27c > > Why is dump_backtrace in the callstack here inbetween > rpmsg_ctrldev_release_device() and kfree()? Isn't the error that we're > calling kfree() on an chunk of memory that contains a live object? When I tried to repro there was no dump_backtrace(): Call trace: debug_print_object+0x13c/0x1b0 __debug_check_no_obj_freed+0x25c/0x3c0 debug_check_no_obj_freed+0x18/0x20 slab_free_freelist_hook+0xbc/0x1e4 kfree+0xfc/0x2f4 rpmsg_ctrldev_release_device+0x78/0xb8 device_release+0x84/0x168 kobject_cleanup+0x12c/0x298 kobject_delayed_cleanup+0x10/0x18 process_one_work+0x578/0x92c worker_thread+0x804/0xcf8 kthread+0x2a8/0x314 ret_from_fork+0x10/0x18 My guess is that Sujit added a dump_backtrace() for debugging and it was still there when the backtrace of the commit message was generated. That would also explain the two 'Call trace:' entries in the log. > > | rpmsg_ctrldev_release_device+0x78/0xb8 > > | device_release+0x68/0x14c > > | show_stack+0x20/0x2c > > | kobject_cleanup+0x12c/0x298 > > | kobject_delayed_cleanup+0x10/0x18 > > | dump_stack+0xe0/0x19c > > | process_one_work+0x578/0x92c > > | worker_thread+0x804/0xcf8 > > | print_address_description+0x3c/0x4a8 > > | kthread+0x2a8/0x314 > > | ret_from_fork+0x10/0x18 > > | __kasan_report+0x100/0x124 > > > > Signed-off-by: Sujit Kautkar <sujitka@xxxxxxxxxxxx> > > --- > > Changes in v3: > > - Remove unecessary error check as per Matthias's comment > > > > Changes in v2: > > - Fix typo in commit message > > > > drivers/rpmsg/rpmsg_char.c | 10 ++-------- > > 1 file changed, 2 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c > > index 876ce43df732b..a6a33155ca859 100644 > > --- a/drivers/rpmsg/rpmsg_char.c > > +++ b/drivers/rpmsg/rpmsg_char.c > > @@ -458,7 +458,7 @@ static void rpmsg_ctrldev_release_device(struct device *dev) > > > > ida_simple_remove(&rpmsg_ctrl_ida, dev->id); > > ida_simple_remove(&rpmsg_minor_ida, MINOR(dev->devt)); > > - cdev_del(&ctrldev->cdev); > > + cdev_device_del(&ctrldev->cdev, &ctrldev->dev); > > I am not able to find any other instance where cdev_device_del() is > called from the device's release function itself, which tells me that > this probably is not the right thing to do. Instead the appropriate way > seem to put the cdev_device_del() in rpmsg_chrdev_remove(). Yes, it sounds reasonable to me to delete the char device when the control device is removed. > That said, we already do device_del() in rpmsg_chrdev_remove() so if the > warning is trying to tell us that ctrldev->dev is not deleted I think we > have an unbalanced put_device()? My understanding is that the situation is analogous to this one: commit 1413ef638abae4ab5621901cf4d8ef08a4a48ba6 Author: Kevin Hao <haokexin@xxxxxxxxx> Date: Fri Oct 11 23:00:14 2019 +0800 i2c: dev: Fix the race between the release of i2c_dev and cdev The struct cdev is embedded in the struct i2c_dev. In the current code, we would free the i2c_dev struct directly in put_i2c_dev(), but the cdev is manged by a kobject, and the release of it is not predictable. So it is very possible that the i2c_dev is freed before the cdev is entirely released. We can easily get the following call trace with CONFIG_DEBUG_KOBJECT_RELEASE and CONFIG_DEBUG_OBJECTS_TIMERS enabled. ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x38 WARNING: CPU: 19 PID: 1 at lib/debugobjects.c:325 debug_print_object+0xb0/0xf0 ... This is a common issue when using cdev embedded in a struct. Fortunately, we already have a mechanism to solve this kind of issue. Please see commit 233ed09d7fda ("chardev: add helper function to egister char devs with a struct device") for more detail. In this patch, we choose to embed the struct device into the i2c_dev, and use the API provided by the commit 233ed09d7fda to make sure that the release of i2c_dev and cdev are in sequence.