Hi Sherry, On Mon, May 16, 2022 at 3:56 AM Sherry Sun <sherry.sun@xxxxxxx> wrote: > > Hi Luiz, > > > > > Hi Sherry, > > > > On Tue, May 10, 2022 at 2:47 AM Sherry Sun <sherry.sun@xxxxxxx> wrote: > > > > > > Hi, Bluetooth community: > > > > > > When I try to connect the BLE device to i.MX8mm platform on L5.17.0-rc8 > > kernel, sometimes BLE connection may fail, at which point I can observe an > > kernel panic (Unable to handle kernel paging request at virtual address > > 00078024e000003b), then I enabled the KASAN, found there is a use-after- > > free issue in hci_le_conn_failed(), check the following error logs for details. > > > There is no such issue with the L5.15 kernel. Has anyone observed the > > similar issue on L5.17.0-rc8 kernel? Or have any idea regarding this issue? > > Many thanks. > > > > Interesting, this might be a problem with the controller generating a different > > event than expected: > > > > /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 2261: > > * > > * If this event is unmasked and the HCI_LE_Connection_Complete event > > * is unmasked, only the HCI_LE_Enhanced_Connection_Complete event is > > * sent when a new connection has been created. > > */ > > err = __hci_cmd_sync_status_sk(hdev, HCI_OP_LE_CREATE_CONN, > > sizeof(cp), &cp, > > use_enhanced_conn_complete(hdev) ? > > HCI_EV_LE_ENHANCED_CONN_COMPLETE : > > HCI_EV_LE_CONN_COMPLETE, > > conn->conn_timeout, NULL); > > > > So if the event is different than expected the event would be processed > > _before_ create_le_conn_complete which would likely timeout, so perhaps > > we need a special handling to match multiple events instead of just one, > > though it worth checking if this is allowed by the spec given that: > > > > /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 1789: > > * > > * C24: Mandatory if the LE Controller supports Connection State and either > > * LE Feature (LL Privacy) or LE Feature (Extended Advertising) is supported > > */ #define use_enhanced_conn_complete(dev) (ll_privacy_capable(dev) || \ > > ext_adv_capable(dev)) > > > > Perhaps this is not enough since it appears we are getting the enhanced > > version even if this evaluates to false so perhaps we should test something > > else, that said we also do: > > > > /* If the controller supports LL Privacy feature or LE Extended Adv, > > * enable the corresponding event. > > */ > > if (use_enhanced_conn_complete(hdev)) > > events[1] |= 0x02; /* LE Enhanced Connection Complete */ > > > > So it looks like the controller is generating LE Enhanced Connection Complete > > even when the host has not unmasked it. > > I have checked that use_enhanced_conn_complete(hdev) is true for my host, so the controller generate LE Enhanced Connection Complete is the expected behavior. > BTW, I have cherry-picked your patch 9b3628d79b46(Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted) to L5.17, found this can avoid the kernel panic(Unable to handle kernel paging request at virtual address dead000000000122), maybe because you removed the hci_conn_del() from hci_le_conn_failed(), but KASAN still reported the same use-after-free issue(BUG: KASAN: use-after-free in hci_le_conn_failed+0x3c8/0x480). > > I checked the btmon log, seems there is an LE Create Connection Cancel HCI command which cause the LE Enhanced Connection Complete fail, I have no idea how this command is triggered, but with your patch, as there is no kernel panic, the hci_connect_le() is still trying to create the LE connection, and 2 seconds later, the connection has been auto created successfully, and LE Enhanced Connection Complete also show the success status. > So the use-after-free issue reported by KASAN maybe caused by the LE Create Connection Cancel command before LE Enhanced Connection Complete return, which cause hci_le_enh_conn_complete_evt() return error status and free the hci_conn struct. Could you try with the following change as well: https://patchwork.kernel.org/project/bluetooth/list/?series=641604 > btmon logs: > < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #1241 2021-11-19 17:24:36.276592 > Scanning: Disabled (0x00) > Filter duplicates: Disabled (0x00) > > HCI Event: Command Complete (0x0e) plen 4 #1242 2021-11-19 17:24:36.277450 > LE Set Scan Enable (0x08|0x000c) ncmd 1 > Status: Success (0x00) > < HCI Command: LE Create Connection (0x08|0x000d) plen 25 #1243 2021-11-19 17:24:36.285151 > Scan interval: 60.000 msec (0x0060) > Scan window: 60.000 msec (0x0060) > Filter policy: White list is not used (0x00) > Peer address type: Random (0x01) > Peer address: E3:BD:5B:22:90:68 (Static) > Own address type: Public (0x00) > Min connection interval: 400.00 msec (0x0140) > Max connection interval: 480.00 msec (0x0180) > Connection latency: 2 (0x0002) > Supervision timeout: 6000 msec (0x0258) > Min connection length: 0.000 msec (0x0000) > Max connection length: 0.000 msec (0x0000) > > HCI Event: Command Status (0x0f) plen 4 #1244 2021-11-19 17:24:36.286326 > LE Create Connection (0x08|0x000d) ncmd 1 > Status: Success (0x00) > < HCI Command: LE Create Connection Cancel (0x08|0x000e) plen 0 #1245 2021-11-19 17:24:40.413907 > > HCI Event: Command Complete (0x0e) plen 4 #1246 2021-11-19 17:24:40.414858 > LE Create Connection Cancel (0x08|0x000e) ncmd 1 > Status: Success (0x00) > > HCI Event: LE Meta Event (0x3e) plen 31 #1247 2021-11-19 17:24:40.415118 > LE Enhanced Connection Complete (0x0a) > Status: Unknown Connection Identifier (0x02) > Handle: 0 > Role: Master (0x00) > Peer address type: Random (0x01) > Peer address: E3:BD:5B:22:90:68 (Static) > Local resolvable private address: 00:00:00:00:00:00 (Non-Resolvable) > Peer resolvable private address: 00:00:00:00:00:00 (Non-Resolvable) > Connection interval: 480.00 msec (0x0180) > Connection latency: 2 (0x0002) > Supervision timeout: 6000 msec (0x0258) > Master clock accuracy: 0x00 > @ MGMT Event: Connect Failed (0x000d) plen 8 {0x0001} 2021-11-19 17:24:40.415190 > LE Address: E3:BD:5B:22:90:68 (Static) > Status: Not Connected (0x02) > @ MGMT Event: Connect Failed (0x000d) plen 8 {0x0001} 2021-11-19 17:24:41.006451 > LE Address: E3:BD:5B:22:90:68 (Static) > Status: Failed (0x03) > > ...(two seconds later) > > < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #1366 2021-11-19 17:24:43.998501 > Scanning: Disabled (0x00) > Filter duplicates: Disabled (0x00) > > HCI Event: Command Complete (0x0e) plen 4 #1367 2021-11-19 17:24:43.999293 > LE Set Scan Enable (0x08|0x000c) ncmd 1 > Status: Success (0x00) > < HCI Command: LE Create Connection (0x08|0x000d) plen 25 #1368 2021-11-19 17:24:44.007019 > Scan interval: 60.000 msec (0x0060) > Scan window: 60.000 msec (0x0060) > Filter policy: White list is not used (0x00) > Peer address type: Random (0x01) > Peer address: E3:BD:5B:22:90:68 (Static) > Own address type: Public (0x00) > Min connection interval: 400.00 msec (0x0140) > Max connection interval: 480.00 msec (0x0180) > Connection latency: 2 (0x0002) > Supervision timeout: 6000 msec (0x0258) > Min connection length: 0.000 msec (0x0000) > Max connection length: 0.000 msec (0x0000) > > HCI Event: Command Status (0x0f) plen 4 #1369 2021-11-19 17:24:44.008290 > LE Create Connection (0x08|0x000d) ncmd 1 > Status: Success (0x00) > > HCI Event: LE Meta Event (0x3e) plen 31 #1370 2021-11-19 17:24:47.843529 > LE Enhanced Connection Complete (0x0a) > Status: Success (0x00) > Handle: 128 > Role: Master (0x00) > Peer address type: Random (0x01) > Peer address: E3:BD:5B:22:90:68 (Static) > Local resolvable private address: 00:00:00:00:00:00 (Non-Resolvable) > Peer resolvable private address: 00:00:00:00:00:00 (Non-Resolvable) > Connection interval: 480.00 msec (0x0180) > Connection latency: 2 (0x0002) > Supervision timeout: 6000 msec (0x0258) > Master clock accuracy: 0x01 > @ MGMT Event: Device Connected (0x000b) plen 41 {0x0001} 2021-11-19 17:24:47.843647 > LE Address: E3:BD:5B:22:90:68 (Static) > Flags: 0x00000008 > Connection Locally Initiated > Data length: 28 > Name (complete): nut > Flags: 0x06 > LE General Discoverable Mode > BR/EDR Not Supported > 16-bit Service UUIDs (complete): 1 entry > Link Loss (0x1803) > Company: Nordic Semiconductor ASA (89) > Data: 0010 > Service Data (UUID 0x180a): e3bd5b229068 > > Best regards > Sherry > > > > > > Error logs: > > > [bluetooth]# connect E3:BD:5B:22:90:68 Attempting to connect to > > > E3:BD:5B:22:90:68 [bluetooth]# [ 552.446716] Bluetooth: hci0: Opcode > > > 0x200d failed: -110 Failed to connect: org.bluez.Error.Failed [ > > > 552.467725] Bluetooth: hci0: request failed to create LE connection: > > > err -110 [ 552.481726] > > > > > ================================================================ > > == > > > [ 552.488966] BUG: KASAN: use-after-free in > > > hci_le_conn_failed+0x4b0/0x5a0 [ 552.495695] Read of size 8 at addr > > > ffff0000126da570 by task kworker/u9:0/73 [ 552.502667] > > > [ 552.504169] CPU: 2 PID: 73 Comm: kworker/u9:0 Tainted: G O > > 5.17.0-rc8-next-20220315-00001-g7c4f5ebecd82-dirty #245 > > > [ 552.515834] Hardware name: FSL i.MX8MM EVK board (DT) [ > > > 552.520894] Workqueue: hci0 hci_cmd_sync_work [ 552.525275] Call > > > trace: > > > [ 552.527727] dump_backtrace.part.0+0x210/0x220 [ 552.532190] > > > show_stack+0x18/0x6c [ 552.535518] dump_stack_lvl+0x68/0x84 [ > > > 552.539199] print_report+0x160/0x5bc [ 552.542878] > > > kasan_report+0xa4/0xf0 [ 552.546378] > > > __asan_report_load8_noabort+0x3c/0x50 > > > [ 552.551185] hci_le_conn_failed+0x4b0/0x5a0 [ 552.555378] > > > create_le_conn_complete+0x60/0xb0 [ 552.559836] > > > hci_cmd_sync_work+0x128/0x210 [ 552.563945] > > > process_one_work+0x60c/0xf04 [ 552.567971] > > worker_thread+0x3bc/0xe1c > > > [ 552.571731] kthread+0x278/0x2e0 [ 552.574976] > > > ret_from_fork+0x10/0x20 [ 552.578568] [ 552.580065] Allocated by > > > task 518: > > > [ 552.583474] kasan_save_stack+0x2c/0x54 [ 552.587323] > > > __kasan_kmalloc+0x90/0xbc [ 552.591086] hci_conn_add+0x70/0xe50 [ > > > 552.594670] hci_connect_le_scan+0x23c/0x71c [ 552.598952] > > > l2cap_chan_connect+0xbe8/0x11ec [ 552.603238] > > > l2cap_sock_connect+0x298/0x57c [ 552.607434] > > > __sys_connect+0x1e4/0x240 [ 552.611198] > > > __arm64_sys_connect+0x6c/0xa0 [ 552.615306] > > > invoke_syscall+0x6c/0x260 [ 552.619073] > > > el0_svc_common.constprop.0+0x17c/0x280 > > > [ 552.623962] do_el0_svc+0xa0/0xd0 > > > [ 552.627292] el0_svc+0x4c/0x8c > > > [ 552.630360] el0t_64_sync_handler+0xa4/0x130 [ 552.634649] > > > el0t_64_sync+0x18c/0x190 [ 552.638325] [ 552.639821] Freed by task > > > 476: > > > [ 552.642882] kasan_save_stack+0x2c/0x54 [ 552.646731] > > > kasan_set_track+0x2c/0x40 [ 552.650494] > > > kasan_set_free_info+0x28/0x50 [ 552.654603] > > > __kasan_slab_free+0xec/0x13c [ 552.658625] kfree+0xd4/0x280 [ > > > 552.661605] bt_link_release+0x2c/0x3c [ 552.665366] > > > device_release+0xe4/0x1f0 [ 552.669131] kobject_put+0x130/0x244 [ > > > 552.672722] put_device+0x18/0x30 [ 552.676049] > > > hci_conn_cleanup+0x230/0x4d4 [ 552.680068] hci_conn_del+0x46c/0x4a4 > > > [ 552.683740] hci_le_conn_failed+0x318/0x5a0 [ 552.687935] > > > le_conn_complete_evt.part.0+0x778/0x15a0 > > > [ 552.693008] hci_le_enh_conn_complete_evt+0x224/0x330 > > > [ 552.698075] hci_le_meta_evt+0x1cc/0x40c [ 552.702011] > > > hci_event_packet+0x3e0/0x165c [ 552.706123] hci_rx_work+0x2e0/0x580 > > > [ 552.709715] process_one_work+0x60c/0xf04 [ 552.713735] > > > worker_thread+0x3bc/0xe1c [ 552.717495] kthread+0x278/0x2e0 [ > > > 552.720739] ret_from_fork+0x10/0x20 [ 552.724330] [ 552.725824] > > > Last potentially related work creation: > > > [ 552.730708] kasan_save_stack+0x2c/0x54 [ 552.734557] > > > __kasan_record_aux_stack+0xa0/0xd0 > > > [ 552.739100] kasan_record_aux_stack_noalloc+0x14/0x20 > > > [ 552.744168] __queue_work+0x428/0xd14 [ 552.747846] > > > __queue_delayed_work+0x144/0x220 [ 552.752216] > > > queue_delayed_work_on+0xa0/0xe0 [ 552.756498] > > > l2cap_chan_del+0x4b8/0xb50 [ 552.760343] l2cap_conn_del+0x254/0x694 > > > [ 552.764187] l2cap_connect_cfm+0x698/0xb10 [ 552.768299] > > > hci_le_conn_failed+0x2c4/0x5a0 [ 552.772494] > > > le_conn_complete_evt.part.0+0x778/0x15a0 > > > [ 552.777565] hci_le_enh_conn_complete_evt+0x224/0x330 > > > [ 552.782630] hci_le_meta_evt+0x1cc/0x40c [ 552.786567] > > > hci_event_packet+0x3e0/0x165c [ 552.790680] hci_rx_work+0x2e0/0x580 > > > [ 552.794273] process_one_work+0x60c/0xf04 [ 552.798292] > > > worker_thread+0x3bc/0xe1c [ 552.802056] kthread+0x278/0x2e0 [ > > > 552.805295] ret_from_fork+0x10/0x20 [ 552.808883] [ 552.810380] > > > Second to last potentially related work creation: > > > [ 552.816133] kasan_save_stack+0x2c/0x54 [ 552.819984] > > > __kasan_record_aux_stack+0xa0/0xd0 > > > [ 552.824526] kasan_record_aux_stack_noalloc+0x14/0x20 > > > [ 552.829592] __queue_work+0x428/0xd14 [ 552.833264] > > > delayed_work_timer_fn+0x68/0xa0 [ 552.837546] > > > call_timer_fn.constprop.0+0x54/0x15c > > > [ 552.842269] __run_timers.part.0+0x3f4/0x604 [ 552.846553] > > > run_timer_softirq+0x78/0x110 [ 552.850580] _stext+0x29c/0x7d8 [ > > > 552.853734] [ 552.855229] The buggy address belongs to the object at > > > ffff0000126da000 [ 552.855229] which belongs to the cache kmalloc-2k > > > of size 2048 [ 552.867759] The buggy address is located 1392 bytes > > > inside of [ 552.867759] 2048-byte region [ffff0000126da000, > > > ffff0000126da800) [ 552.879683] [ 552.881180] The buggy address > > > belongs to the physical page: > > > [ 552.886761] page:00000000bc19f159 refcount:1 mapcount:0 > > > mapping:0000000000000000 index:0x0 pfn:0x526d8 [ 552.896083] > > > head:00000000bc19f159 order:3 compound_mapcount:0 > > compound_pincount:0 > > > [ 552.903576] flags: > > > 0x3fffc0000010200(slab|head|node=0|zone=0|lastcpupid=0xffff) > > > [ 552.910910] raw: 03fffc0000010200 dead000000000100 > > dead000000000122 > > > ffff000008002900 [ 552.918663] raw: 0000000000000000 > > 0000000080080008 > > > 00000001ffffffff 0000000000000000 [ 552.926413] page dumped because: > > > kasan: bad access detected [ 552.931989] [ 552.933486] Memory state > > > around the buggy address: > > > [ 552.938285] ffff0000126da400: fb fb fb fb fb fb fb fb fb fb fb fb > > > fb fb fb fb [ 552.945518] ffff0000126da480: fb fb fb fb fb fb fb fb > > > fb fb fb fb fb fb fb fb [ 552.952752] >ffff0000126da500: fb fb fb fb fb fb fb > > fb fb fb fb fb fb fb fb fb > > > [ 552.959980] ^ > > > [ 552.966862] ffff0000126da580: fb fb fb fb fb fb fb fb fb fb fb fb > > > fb fb fb fb [ 552.974093] ffff0000126da600: fb fb fb fb fb fb fb fb > > > fb fb fb fb fb fb fb fb [ 552.981323] > > > > > ================================================================ > > == > > > [ 552.988700] Disabling lock debugging due to kernel taint [ > > > 552.994223] hci0 hcon 00000000c12ea080 handle 65535 [ 552.994267] > > > Unable to handle kernel paging request at virtual address > > > 00078024e000003b [ 553.007176] Mem abort info: > > > [ 553.009986] ESR = 0x96000004 > > > [ 553.013078] EC = 0x25: DABT (current EL), IL = 32 bits > > > [ 553.018460] SET = 0, FnV = 0 > > > [ 553.021615] EA = 0, S1PTW = 0 > > > [ 553.024845] FSC = 0x04: level 0 translation fault > > > [ 553.029772] Data abort info: > > > [ 553.032726] ISV = 0, ISS = 0x00000004 > > > [ 553.036617] CM = 0, WnR = 0 > > > [ 553.039651] [00078024e000003b] address between user and kernel > > > address ranges [ 553.046850] Internal error: Oops: 96000004 [#1] > > > PREEMPT SMP [ 553.052450] Modules linked in: moal(O) mlan(O) > > > [ 553.056948] CPU: 3 PID: 73 Comm: kworker/u9:0 Tainted: G B O > > 5.17.0-rc8-next-20220315-00001-g7c4f5ebecd82-dirty #245 > > > [ 553.056965] Hardware name: FSL i.MX8MM EVK board (DT) [ > > > 553.056974] Workqueue: hci0 hci_cmd_sync_work [ 553.073677] pstate: > > > 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 553.085004] > > > pc : hci_conn_cleanup+0x128/0x4d4 [ 553.085019] lr : > > > hci_conn_cleanup+0x42c/0x4d4 [ 553.093734] sp : ffff80000d797ae0 [ > > > 553.093740] x29: ffff80000d797ae0 x28: ffff00000e85f938 x27: > > > 0000000000000002 [ 553.104206] x26: 1fffe00001d0bf27 x25: > > > ffff000010a78000 x24: ffff000010a78000 [ 553.104227] x23: > > > dfff800000000000 x22: ffff0000126da0e8 x21: ffff00000f98d000 [ > > > 553.104249] x20: ffff0000126da000 x19: a0400127000001dc x18: > > > 0000000000000000 [ 553.118534] x17: ffff80000ac1f008 x16: > > > ffff80000ac1ed74 x15: 0000000000000007 [ 553.132821] x14: > > > 0000000000000000 x13: 0000000000000007 x12: ffff6000024db41d [ > > > 553.139981] x11: 1fffe000024db41c x10: ffff6000024db41c x9 : > > > dfff800000000000 [ 553.140000] x8 : ffff0000126da0e4 x7 : > > > 0000000000000001 x6 : 0000000000000004 [ 553.154287] x5 : > > ffff0000126da0e4 x4 : 0000000000000000 x3 : ffff80000ab714cc [ 553.154318] > > x2 : 0000000000000001 x1 : 14080024e000003b x0 : dfff800000000000 > > [ 553.168614] Call trace: > > > [ 553.168625] hci_conn_cleanup+0x128/0x4d4 [ 553.175094] > > > hci_conn_del+0x46c/0x4a4 [ 553.178775] > > > hci_le_conn_failed+0x318/0x5a0 [ 553.182970] > > > create_le_conn_complete+0x60/0xb0 [ 553.187425] > > > hci_cmd_sync_work+0x128/0x210 > > > > > > Message from syslogd@ at Fri Nov 19 17:28:03 2021 ... > > > : Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 553.201344] > > > process_one_work+0x60c/0xf04 [ 553.205379] > > worker_thread+0x3bc/0xe1c > > > [ 553.209137] kthread+0x278/0x2e0 [ 553.212378] > > > ret_from_fork+0x10/0x20 [ 553.215973] Code: d343fe61 d2d00000 > > > f2fbffe0 f90006b3 (38e06820) [ 553.222079] ---[ end trace > > > 0000000000000000 ]--- > > > > > > Message from syslogd@ at Fri Nov 19 17:28:03 2021 ... > > > : Code: d343fe61 d2d00000 f2fbffe0 f90006b3 (38e06820) > > > > > > Best regards > > > Sherry > > > > > > > > > -- > > Luiz Augusto von Dentz -- Luiz Augusto von Dentz