Hi Dmitry, On Sat, Nov 2, 2024 at 5:51 AM Dmitry Antipov <dmantipov@xxxxxxxxx> wrote: > > On 11/1/24 8:37 PM, Luiz Augusto von Dentz wrote: > > > Looks like this doesn't solve the problem, in fact I think you are > > getting it backwards, you are trying to reparent the parent dev not > > the child and I assume by destroying the parent device there should be > > some way to reset the parent which seems to be the intent the > > following code in hci_conn_del_sysfs: > > > > while (1) { > > struct device *dev; > > > > dev = device_find_child(&conn->dev, NULL, __match_tty); > > if (!dev) > > break; > > device_move(dev, NULL, DPM_ORDER_DEV_LAST); > > put_device(dev); > > } > > > > But note that it only does that after matching tty, but I guess we > > want to do it regardless otherwise we may have the child objects still > > access it, that said we should probably use device_for_each_child > > though if that is safe to do calls to device_move under its callback. > > I'm not sure that I've got your point. IIUC the problem is that a controller > (parent) instance may be freed _after_ the child (connection) has passed > 'device_unregister(&conn->dev)' but _before_ an actual removal of 'conn->dev' > from the devices hierarchy, thus leaving the dangling 'conn->dev.parent' > pointer. The latter may be fixed by reparenting 'conn->dev' to NULL explicitly. > And handling children of 'conn->dev' (i.e. the grandchilren of the controller) > is out of this problem's scope at all. > > And nothing to say about syzbot's innards but manual testing shows that the > following thing: > > void hci_conn_del_sysfs(struct hci_conn *conn) > { > struct hci_dev *hdev = conn->hdev; > > bt_dev_dbg(hdev, "conn %p", conn); > > if (!device_is_registered(&conn->dev)) { > /* If device_add() has *not* succeeded, use *only* put_device() > * to drop the reference count. > */ > put_device(&conn->dev); > return; > } > > while (1) { > struct device *dev; > > dev = device_find_any_child(&conn->dev); > if (!dev) > break; > printk(KERN_ERR "%s:%d: reparent dev@%p(%s) with parent@%p(%s)\n", > __FILE__, __LINE__, dev, dev_name(dev), dev->parent, > (dev->parent ? dev_name(dev->parent) : "<none>")); > device_move(dev, NULL, DPM_ORDER_DEV_LAST); > put_device(dev); > } > > device_unregister(&conn->dev); > } > > occasionally triggers the following crash: > > net/bluetooth/hci_sysfs.c:82: reparent dev@ffff888114be86f8(bnep0) with parent@ffff888111c64b68(hci4:200) > Oops: general protection fault, probably for non-canonical address 0xdffffc000000000b: 0000 [#1] PREEMPT SMP KASI > KASAN: null-ptr-deref in range [0x0000000000000058-0x000000000000005f] > CPU: 3 UID: 0 PID: 6033 Comm: repro Not tainted 6.12.0-rc5-00299-g11066801dd4b-dirty #8 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 > RIP: 0010:klist_put+0x4d/0x1d0 > Code: c1 ea 03 80 3c 02 00 0f 85 74 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 23 49 83 e4 fe 49 8d 7c 24 58 49 > RSP: 0018:ffffc9000423f9b0 EFLAGS: 00010202 > RAX: dffffc0000000000 RBX: ffff88810f43ac60 RCX: 0000000000000000 > RDX: 000000000000000b RSI: ffffffff8a92d415 RDI: 0000000000000058 > RBP: 0000000000000001 R08: 0000000000000000 R09: fffffbfff1fad62c > R10: ffffffff8fd6b163 R11: 0000000000000000 R12: 0000000000000000 > R13: 0000000000000001 R14: ffffc9000423fa30 R15: ffffffff8a92d805 > FS: 00007f24ebb78740(0000) GS:ffff888135f00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000055c52afb6950 CR3: 0000000020c1c000 CR4: 00000000000006f0 > Call Trace: > <TASK> > ? die_addr.cold+0x8/0xd > ? exc_general_protection+0x147/0x240 > ? asm_exc_general_protection+0x26/0x30 > ? klist_remove+0x155/0x2b0 > ? klist_put+0x15/0x1d0 > ? klist_put+0x4d/0x1d0 > klist_remove+0x15a/0x2b0 > ? __pfx_klist_remove+0x10/0x10 > device_move+0x12d/0x10b0 > hci_conn_del_sysfs.cold+0xcf/0x14a > hci_conn_del+0x467/0xd60 > hci_conn_hash_flush+0x18f/0x270 > hci_dev_close_sync+0x549/0x1260 > hci_dev_do_close+0x2e/0x90 > hci_unregister_dev+0x213/0x630 > vhci_release+0x79/0xf0 > ? __pfx_vhci_release+0x10/0x10 > __fput+0x3f6/0xb30 > task_work_run+0x151/0x250 > ? __pfx_task_work_run+0x10/0x10 > do_exit+0xa79/0x2c30 > ? do_raw_spin_lock+0x12a/0x2b0 > ? __pfx_do_exit+0x10/0x10 > do_group_exit+0xd5/0x2a0 > __x64_sys_exit_group+0x3e/0x50 > x64_sys_call+0x14af/0x14b0 > do_syscall_64+0xc7/0x250 > entry_SYSCALL_64_after_hwframe+0x77/0x7f Weird, this was not reproduced by syzbot when I asked it to test, how are you reproducing this? This looks like to be a problem with klist being corrupted somehow, anyway if we can't unparent the objects we need a way to unregister the child without waiting for the bnep thread to clean it up. > Dmitry -- Luiz Augusto von Dentz