On 11/1/24 8:37 PM, Luiz Augusto von Dentz wrote:
Looks like this doesn't solve the problem, in fact I think you are getting it backwards, you are trying to reparent the parent dev not the child and I assume by destroying the parent device there should be some way to reset the parent which seems to be the intent the following code in hci_conn_del_sysfs: while (1) { struct device *dev; dev = device_find_child(&conn->dev, NULL, __match_tty); if (!dev) break; device_move(dev, NULL, DPM_ORDER_DEV_LAST); put_device(dev); } But note that it only does that after matching tty, but I guess we want to do it regardless otherwise we may have the child objects still access it, that said we should probably use device_for_each_child though if that is safe to do calls to device_move under its callback.
I'm not sure that I've got your point. IIUC the problem is that a controller (parent) instance may be freed _after_ the child (connection) has passed 'device_unregister(&conn->dev)' but _before_ an actual removal of 'conn->dev' from the devices hierarchy, thus leaving the dangling 'conn->dev.parent' pointer. The latter may be fixed by reparenting 'conn->dev' to NULL explicitly. And handling children of 'conn->dev' (i.e. the grandchilren of the controller) is out of this problem's scope at all. And nothing to say about syzbot's innards but manual testing shows that the following thing: void hci_conn_del_sysfs(struct hci_conn *conn) { struct hci_dev *hdev = conn->hdev; bt_dev_dbg(hdev, "conn %p", conn); if (!device_is_registered(&conn->dev)) { /* If device_add() has *not* succeeded, use *only* put_device() * to drop the reference count. */ put_device(&conn->dev); return; } while (1) { struct device *dev; dev = device_find_any_child(&conn->dev); if (!dev) break; printk(KERN_ERR "%s:%d: reparent dev@%p(%s) with parent@%p(%s)\n", __FILE__, __LINE__, dev, dev_name(dev), dev->parent, (dev->parent ? dev_name(dev->parent) : "<none>")); device_move(dev, NULL, DPM_ORDER_DEV_LAST); put_device(dev); } device_unregister(&conn->dev); } occasionally triggers the following crash: net/bluetooth/hci_sysfs.c:82: reparent dev@ffff888114be86f8(bnep0) with parent@ffff888111c64b68(hci4:200) Oops: general protection fault, probably for non-canonical address 0xdffffc000000000b: 0000 [#1] PREEMPT SMP KASI KASAN: null-ptr-deref in range [0x0000000000000058-0x000000000000005f] CPU: 3 UID: 0 PID: 6033 Comm: repro Not tainted 6.12.0-rc5-00299-g11066801dd4b-dirty #8 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 RIP: 0010:klist_put+0x4d/0x1d0 Code: c1 ea 03 80 3c 02 00 0f 85 74 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 23 49 83 e4 fe 49 8d 7c 24 58 49 RSP: 0018:ffffc9000423f9b0 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff88810f43ac60 RCX: 0000000000000000 RDX: 000000000000000b RSI: ffffffff8a92d415 RDI: 0000000000000058 RBP: 0000000000000001 R08: 0000000000000000 R09: fffffbfff1fad62c R10: ffffffff8fd6b163 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000001 R14: ffffc9000423fa30 R15: ffffffff8a92d805 FS: 00007f24ebb78740(0000) GS:ffff888135f00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055c52afb6950 CR3: 0000000020c1c000 CR4: 00000000000006f0 Call Trace: <TASK> ? die_addr.cold+0x8/0xd ? exc_general_protection+0x147/0x240 ? asm_exc_general_protection+0x26/0x30 ? klist_remove+0x155/0x2b0 ? klist_put+0x15/0x1d0 ? klist_put+0x4d/0x1d0 klist_remove+0x15a/0x2b0 ? __pfx_klist_remove+0x10/0x10 device_move+0x12d/0x10b0 hci_conn_del_sysfs.cold+0xcf/0x14a hci_conn_del+0x467/0xd60 hci_conn_hash_flush+0x18f/0x270 hci_dev_close_sync+0x549/0x1260 hci_dev_do_close+0x2e/0x90 hci_unregister_dev+0x213/0x630 vhci_release+0x79/0xf0 ? __pfx_vhci_release+0x10/0x10 __fput+0x3f6/0xb30 task_work_run+0x151/0x250 ? __pfx_task_work_run+0x10/0x10 do_exit+0xa79/0x2c30 ? do_raw_spin_lock+0x12a/0x2b0 ? __pfx_do_exit+0x10/0x10 do_group_exit+0xd5/0x2a0 __x64_sys_exit_group+0x3e/0x50 x64_sys_call+0x14af/0x14b0 do_syscall_64+0xc7/0x250 entry_SYSCALL_64_after_hwframe+0x77/0x7f Dmitry