Hi Luiz, > > Hi Sherry, > > On Tue, May 10, 2022 at 2:47 AM Sherry Sun <sherry.sun@xxxxxxx> wrote: > > > > Hi, Bluetooth community: > > > > When I try to connect the BLE device to i.MX8mm platform on L5.17.0-rc8 > kernel, sometimes BLE connection may fail, at which point I can observe an > kernel panic (Unable to handle kernel paging request at virtual address > 00078024e000003b), then I enabled the KASAN, found there is a use-after- > free issue in hci_le_conn_failed(), check the following error logs for details. > > There is no such issue with the L5.15 kernel. Has anyone observed the > similar issue on L5.17.0-rc8 kernel? Or have any idea regarding this issue? > Many thanks. > > Interesting, this might be a problem with the controller generating a different > event than expected: > > /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 2261: > * > * If this event is unmasked and the HCI_LE_Connection_Complete event > * is unmasked, only the HCI_LE_Enhanced_Connection_Complete event is > * sent when a new connection has been created. > */ > err = __hci_cmd_sync_status_sk(hdev, HCI_OP_LE_CREATE_CONN, > sizeof(cp), &cp, > use_enhanced_conn_complete(hdev) ? > HCI_EV_LE_ENHANCED_CONN_COMPLETE : > HCI_EV_LE_CONN_COMPLETE, > conn->conn_timeout, NULL); > > So if the event is different than expected the event would be processed > _before_ create_le_conn_complete which would likely timeout, so perhaps > we need a special handling to match multiple events instead of just one, > though it worth checking if this is allowed by the spec given that: > > /* BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 1789: > * > * C24: Mandatory if the LE Controller supports Connection State and either > * LE Feature (LL Privacy) or LE Feature (Extended Advertising) is supported > */ #define use_enhanced_conn_complete(dev) (ll_privacy_capable(dev) || \ > ext_adv_capable(dev)) > > Perhaps this is not enough since it appears we are getting the enhanced > version even if this evaluates to false so perhaps we should test something > else, that said we also do: > > /* If the controller supports LL Privacy feature or LE Extended Adv, > * enable the corresponding event. > */ > if (use_enhanced_conn_complete(hdev)) > events[1] |= 0x02; /* LE Enhanced Connection Complete */ > > So it looks like the controller is generating LE Enhanced Connection Complete > even when the host has not unmasked it. I have checked that use_enhanced_conn_complete(hdev) is true for my host, so the controller generate LE Enhanced Connection Complete is the expected behavior. BTW, I have cherry-picked your patch 9b3628d79b46(Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted) to L5.17, found this can avoid the kernel panic(Unable to handle kernel paging request at virtual address dead000000000122), maybe because you removed the hci_conn_del() from hci_le_conn_failed(), but KASAN still reported the same use-after-free issue(BUG: KASAN: use-after-free in hci_le_conn_failed+0x3c8/0x480). I checked the btmon log, seems there is an LE Create Connection Cancel HCI command which cause the LE Enhanced Connection Complete fail, I have no idea how this command is triggered, but with your patch, as there is no kernel panic, the hci_connect_le() is still trying to create the LE connection, and 2 seconds later, the connection has been auto created successfully, and LE Enhanced Connection Complete also show the success status. So the use-after-free issue reported by KASAN maybe caused by the LE Create Connection Cancel command before LE Enhanced Connection Complete return, which cause hci_le_enh_conn_complete_evt() return error status and free the hci_conn struct. btmon logs: < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #1241 2021-11-19 17:24:36.276592 Scanning: Disabled (0x00) Filter duplicates: Disabled (0x00) > HCI Event: Command Complete (0x0e) plen 4 #1242 2021-11-19 17:24:36.277450 LE Set Scan Enable (0x08|0x000c) ncmd 1 Status: Success (0x00) < HCI Command: LE Create Connection (0x08|0x000d) plen 25 #1243 2021-11-19 17:24:36.285151 Scan interval: 60.000 msec (0x0060) Scan window: 60.000 msec (0x0060) Filter policy: White list is not used (0x00) Peer address type: Random (0x01) Peer address: E3:BD:5B:22:90:68 (Static) Own address type: Public (0x00) Min connection interval: 400.00 msec (0x0140) Max connection interval: 480.00 msec (0x0180) Connection latency: 2 (0x0002) Supervision timeout: 6000 msec (0x0258) Min connection length: 0.000 msec (0x0000) Max connection length: 0.000 msec (0x0000) > HCI Event: Command Status (0x0f) plen 4 #1244 2021-11-19 17:24:36.286326 LE Create Connection (0x08|0x000d) ncmd 1 Status: Success (0x00) < HCI Command: LE Create Connection Cancel (0x08|0x000e) plen 0 #1245 2021-11-19 17:24:40.413907 > HCI Event: Command Complete (0x0e) plen 4 #1246 2021-11-19 17:24:40.414858 LE Create Connection Cancel (0x08|0x000e) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 31 #1247 2021-11-19 17:24:40.415118 LE Enhanced Connection Complete (0x0a) Status: Unknown Connection Identifier (0x02) Handle: 0 Role: Master (0x00) Peer address type: Random (0x01) Peer address: E3:BD:5B:22:90:68 (Static) Local resolvable private address: 00:00:00:00:00:00 (Non-Resolvable) Peer resolvable private address: 00:00:00:00:00:00 (Non-Resolvable) Connection interval: 480.00 msec (0x0180) Connection latency: 2 (0x0002) Supervision timeout: 6000 msec (0x0258) Master clock accuracy: 0x00 @ MGMT Event: Connect Failed (0x000d) plen 8 {0x0001} 2021-11-19 17:24:40.415190 LE Address: E3:BD:5B:22:90:68 (Static) Status: Not Connected (0x02) @ MGMT Event: Connect Failed (0x000d) plen 8 {0x0001} 2021-11-19 17:24:41.006451 LE Address: E3:BD:5B:22:90:68 (Static) Status: Failed (0x03) ...(two seconds later) < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #1366 2021-11-19 17:24:43.998501 Scanning: Disabled (0x00) Filter duplicates: Disabled (0x00) > HCI Event: Command Complete (0x0e) plen 4 #1367 2021-11-19 17:24:43.999293 LE Set Scan Enable (0x08|0x000c) ncmd 1 Status: Success (0x00) < HCI Command: LE Create Connection (0x08|0x000d) plen 25 #1368 2021-11-19 17:24:44.007019 Scan interval: 60.000 msec (0x0060) Scan window: 60.000 msec (0x0060) Filter policy: White list is not used (0x00) Peer address type: Random (0x01) Peer address: E3:BD:5B:22:90:68 (Static) Own address type: Public (0x00) Min connection interval: 400.00 msec (0x0140) Max connection interval: 480.00 msec (0x0180) Connection latency: 2 (0x0002) Supervision timeout: 6000 msec (0x0258) Min connection length: 0.000 msec (0x0000) Max connection length: 0.000 msec (0x0000) > HCI Event: Command Status (0x0f) plen 4 #1369 2021-11-19 17:24:44.008290 LE Create Connection (0x08|0x000d) ncmd 1 Status: Success (0x00) > HCI Event: LE Meta Event (0x3e) plen 31 #1370 2021-11-19 17:24:47.843529 LE Enhanced Connection Complete (0x0a) Status: Success (0x00) Handle: 128 Role: Master (0x00) Peer address type: Random (0x01) Peer address: E3:BD:5B:22:90:68 (Static) Local resolvable private address: 00:00:00:00:00:00 (Non-Resolvable) Peer resolvable private address: 00:00:00:00:00:00 (Non-Resolvable) Connection interval: 480.00 msec (0x0180) Connection latency: 2 (0x0002) Supervision timeout: 6000 msec (0x0258) Master clock accuracy: 0x01 @ MGMT Event: Device Connected (0x000b) plen 41 {0x0001} 2021-11-19 17:24:47.843647 LE Address: E3:BD:5B:22:90:68 (Static) Flags: 0x00000008 Connection Locally Initiated Data length: 28 Name (complete): nut Flags: 0x06 LE General Discoverable Mode BR/EDR Not Supported 16-bit Service UUIDs (complete): 1 entry Link Loss (0x1803) Company: Nordic Semiconductor ASA (89) Data: 0010 Service Data (UUID 0x180a): e3bd5b229068 Best regards Sherry > > > Error logs: > > [bluetooth]# connect E3:BD:5B:22:90:68 Attempting to connect to > > E3:BD:5B:22:90:68 [bluetooth]# [ 552.446716] Bluetooth: hci0: Opcode > > 0x200d failed: -110 Failed to connect: org.bluez.Error.Failed [ > > 552.467725] Bluetooth: hci0: request failed to create LE connection: > > err -110 [ 552.481726] > > > ================================================================ > == > > [ 552.488966] BUG: KASAN: use-after-free in > > hci_le_conn_failed+0x4b0/0x5a0 [ 552.495695] Read of size 8 at addr > > ffff0000126da570 by task kworker/u9:0/73 [ 552.502667] > > [ 552.504169] CPU: 2 PID: 73 Comm: kworker/u9:0 Tainted: G O > 5.17.0-rc8-next-20220315-00001-g7c4f5ebecd82-dirty #245 > > [ 552.515834] Hardware name: FSL i.MX8MM EVK board (DT) [ > > 552.520894] Workqueue: hci0 hci_cmd_sync_work [ 552.525275] Call > > trace: > > [ 552.527727] dump_backtrace.part.0+0x210/0x220 [ 552.532190] > > show_stack+0x18/0x6c [ 552.535518] dump_stack_lvl+0x68/0x84 [ > > 552.539199] print_report+0x160/0x5bc [ 552.542878] > > kasan_report+0xa4/0xf0 [ 552.546378] > > __asan_report_load8_noabort+0x3c/0x50 > > [ 552.551185] hci_le_conn_failed+0x4b0/0x5a0 [ 552.555378] > > create_le_conn_complete+0x60/0xb0 [ 552.559836] > > hci_cmd_sync_work+0x128/0x210 [ 552.563945] > > process_one_work+0x60c/0xf04 [ 552.567971] > worker_thread+0x3bc/0xe1c > > [ 552.571731] kthread+0x278/0x2e0 [ 552.574976] > > ret_from_fork+0x10/0x20 [ 552.578568] [ 552.580065] Allocated by > > task 518: > > [ 552.583474] kasan_save_stack+0x2c/0x54 [ 552.587323] > > __kasan_kmalloc+0x90/0xbc [ 552.591086] hci_conn_add+0x70/0xe50 [ > > 552.594670] hci_connect_le_scan+0x23c/0x71c [ 552.598952] > > l2cap_chan_connect+0xbe8/0x11ec [ 552.603238] > > l2cap_sock_connect+0x298/0x57c [ 552.607434] > > __sys_connect+0x1e4/0x240 [ 552.611198] > > __arm64_sys_connect+0x6c/0xa0 [ 552.615306] > > invoke_syscall+0x6c/0x260 [ 552.619073] > > el0_svc_common.constprop.0+0x17c/0x280 > > [ 552.623962] do_el0_svc+0xa0/0xd0 > > [ 552.627292] el0_svc+0x4c/0x8c > > [ 552.630360] el0t_64_sync_handler+0xa4/0x130 [ 552.634649] > > el0t_64_sync+0x18c/0x190 [ 552.638325] [ 552.639821] Freed by task > > 476: > > [ 552.642882] kasan_save_stack+0x2c/0x54 [ 552.646731] > > kasan_set_track+0x2c/0x40 [ 552.650494] > > kasan_set_free_info+0x28/0x50 [ 552.654603] > > __kasan_slab_free+0xec/0x13c [ 552.658625] kfree+0xd4/0x280 [ > > 552.661605] bt_link_release+0x2c/0x3c [ 552.665366] > > device_release+0xe4/0x1f0 [ 552.669131] kobject_put+0x130/0x244 [ > > 552.672722] put_device+0x18/0x30 [ 552.676049] > > hci_conn_cleanup+0x230/0x4d4 [ 552.680068] hci_conn_del+0x46c/0x4a4 > > [ 552.683740] hci_le_conn_failed+0x318/0x5a0 [ 552.687935] > > le_conn_complete_evt.part.0+0x778/0x15a0 > > [ 552.693008] hci_le_enh_conn_complete_evt+0x224/0x330 > > [ 552.698075] hci_le_meta_evt+0x1cc/0x40c [ 552.702011] > > hci_event_packet+0x3e0/0x165c [ 552.706123] hci_rx_work+0x2e0/0x580 > > [ 552.709715] process_one_work+0x60c/0xf04 [ 552.713735] > > worker_thread+0x3bc/0xe1c [ 552.717495] kthread+0x278/0x2e0 [ > > 552.720739] ret_from_fork+0x10/0x20 [ 552.724330] [ 552.725824] > > Last potentially related work creation: > > [ 552.730708] kasan_save_stack+0x2c/0x54 [ 552.734557] > > __kasan_record_aux_stack+0xa0/0xd0 > > [ 552.739100] kasan_record_aux_stack_noalloc+0x14/0x20 > > [ 552.744168] __queue_work+0x428/0xd14 [ 552.747846] > > __queue_delayed_work+0x144/0x220 [ 552.752216] > > queue_delayed_work_on+0xa0/0xe0 [ 552.756498] > > l2cap_chan_del+0x4b8/0xb50 [ 552.760343] l2cap_conn_del+0x254/0x694 > > [ 552.764187] l2cap_connect_cfm+0x698/0xb10 [ 552.768299] > > hci_le_conn_failed+0x2c4/0x5a0 [ 552.772494] > > le_conn_complete_evt.part.0+0x778/0x15a0 > > [ 552.777565] hci_le_enh_conn_complete_evt+0x224/0x330 > > [ 552.782630] hci_le_meta_evt+0x1cc/0x40c [ 552.786567] > > hci_event_packet+0x3e0/0x165c [ 552.790680] hci_rx_work+0x2e0/0x580 > > [ 552.794273] process_one_work+0x60c/0xf04 [ 552.798292] > > worker_thread+0x3bc/0xe1c [ 552.802056] kthread+0x278/0x2e0 [ > > 552.805295] ret_from_fork+0x10/0x20 [ 552.808883] [ 552.810380] > > Second to last potentially related work creation: > > [ 552.816133] kasan_save_stack+0x2c/0x54 [ 552.819984] > > __kasan_record_aux_stack+0xa0/0xd0 > > [ 552.824526] kasan_record_aux_stack_noalloc+0x14/0x20 > > [ 552.829592] __queue_work+0x428/0xd14 [ 552.833264] > > delayed_work_timer_fn+0x68/0xa0 [ 552.837546] > > call_timer_fn.constprop.0+0x54/0x15c > > [ 552.842269] __run_timers.part.0+0x3f4/0x604 [ 552.846553] > > run_timer_softirq+0x78/0x110 [ 552.850580] _stext+0x29c/0x7d8 [ > > 552.853734] [ 552.855229] The buggy address belongs to the object at > > ffff0000126da000 [ 552.855229] which belongs to the cache kmalloc-2k > > of size 2048 [ 552.867759] The buggy address is located 1392 bytes > > inside of [ 552.867759] 2048-byte region [ffff0000126da000, > > ffff0000126da800) [ 552.879683] [ 552.881180] The buggy address > > belongs to the physical page: > > [ 552.886761] page:00000000bc19f159 refcount:1 mapcount:0 > > mapping:0000000000000000 index:0x0 pfn:0x526d8 [ 552.896083] > > head:00000000bc19f159 order:3 compound_mapcount:0 > compound_pincount:0 > > [ 552.903576] flags: > > 0x3fffc0000010200(slab|head|node=0|zone=0|lastcpupid=0xffff) > > [ 552.910910] raw: 03fffc0000010200 dead000000000100 > dead000000000122 > > ffff000008002900 [ 552.918663] raw: 0000000000000000 > 0000000080080008 > > 00000001ffffffff 0000000000000000 [ 552.926413] page dumped because: > > kasan: bad access detected [ 552.931989] [ 552.933486] Memory state > > around the buggy address: > > [ 552.938285] ffff0000126da400: fb fb fb fb fb fb fb fb fb fb fb fb > > fb fb fb fb [ 552.945518] ffff0000126da480: fb fb fb fb fb fb fb fb > > fb fb fb fb fb fb fb fb [ 552.952752] >ffff0000126da500: fb fb fb fb fb fb fb > fb fb fb fb fb fb fb fb fb > > [ 552.959980] ^ > > [ 552.966862] ffff0000126da580: fb fb fb fb fb fb fb fb fb fb fb fb > > fb fb fb fb [ 552.974093] ffff0000126da600: fb fb fb fb fb fb fb fb > > fb fb fb fb fb fb fb fb [ 552.981323] > > > ================================================================ > == > > [ 552.988700] Disabling lock debugging due to kernel taint [ > > 552.994223] hci0 hcon 00000000c12ea080 handle 65535 [ 552.994267] > > Unable to handle kernel paging request at virtual address > > 00078024e000003b [ 553.007176] Mem abort info: > > [ 553.009986] ESR = 0x96000004 > > [ 553.013078] EC = 0x25: DABT (current EL), IL = 32 bits > > [ 553.018460] SET = 0, FnV = 0 > > [ 553.021615] EA = 0, S1PTW = 0 > > [ 553.024845] FSC = 0x04: level 0 translation fault > > [ 553.029772] Data abort info: > > [ 553.032726] ISV = 0, ISS = 0x00000004 > > [ 553.036617] CM = 0, WnR = 0 > > [ 553.039651] [00078024e000003b] address between user and kernel > > address ranges [ 553.046850] Internal error: Oops: 96000004 [#1] > > PREEMPT SMP [ 553.052450] Modules linked in: moal(O) mlan(O) > > [ 553.056948] CPU: 3 PID: 73 Comm: kworker/u9:0 Tainted: G B O > 5.17.0-rc8-next-20220315-00001-g7c4f5ebecd82-dirty #245 > > [ 553.056965] Hardware name: FSL i.MX8MM EVK board (DT) [ > > 553.056974] Workqueue: hci0 hci_cmd_sync_work [ 553.073677] pstate: > > 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 553.085004] > > pc : hci_conn_cleanup+0x128/0x4d4 [ 553.085019] lr : > > hci_conn_cleanup+0x42c/0x4d4 [ 553.093734] sp : ffff80000d797ae0 [ > > 553.093740] x29: ffff80000d797ae0 x28: ffff00000e85f938 x27: > > 0000000000000002 [ 553.104206] x26: 1fffe00001d0bf27 x25: > > ffff000010a78000 x24: ffff000010a78000 [ 553.104227] x23: > > dfff800000000000 x22: ffff0000126da0e8 x21: ffff00000f98d000 [ > > 553.104249] x20: ffff0000126da000 x19: a0400127000001dc x18: > > 0000000000000000 [ 553.118534] x17: ffff80000ac1f008 x16: > > ffff80000ac1ed74 x15: 0000000000000007 [ 553.132821] x14: > > 0000000000000000 x13: 0000000000000007 x12: ffff6000024db41d [ > > 553.139981] x11: 1fffe000024db41c x10: ffff6000024db41c x9 : > > dfff800000000000 [ 553.140000] x8 : ffff0000126da0e4 x7 : > > 0000000000000001 x6 : 0000000000000004 [ 553.154287] x5 : > ffff0000126da0e4 x4 : 0000000000000000 x3 : ffff80000ab714cc [ 553.154318] > x2 : 0000000000000001 x1 : 14080024e000003b x0 : dfff800000000000 > [ 553.168614] Call trace: > > [ 553.168625] hci_conn_cleanup+0x128/0x4d4 [ 553.175094] > > hci_conn_del+0x46c/0x4a4 [ 553.178775] > > hci_le_conn_failed+0x318/0x5a0 [ 553.182970] > > create_le_conn_complete+0x60/0xb0 [ 553.187425] > > hci_cmd_sync_work+0x128/0x210 > > > > Message from syslogd@ at Fri Nov 19 17:28:03 2021 ... > > : Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 553.201344] > > process_one_work+0x60c/0xf04 [ 553.205379] > worker_thread+0x3bc/0xe1c > > [ 553.209137] kthread+0x278/0x2e0 [ 553.212378] > > ret_from_fork+0x10/0x20 [ 553.215973] Code: d343fe61 d2d00000 > > f2fbffe0 f90006b3 (38e06820) [ 553.222079] ---[ end trace > > 0000000000000000 ]--- > > > > Message from syslogd@ at Fri Nov 19 17:28:03 2021 ... > > : Code: d343fe61 d2d00000 f2fbffe0 f90006b3 (38e06820) > > > > Best regards > > Sherry > > > > > -- > Luiz Augusto von Dentz