Re: [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Marcel,

On 03/04/17 16:51, Marcel Holtmann wrote:

If this is an issue in 4.10, then lets get this fixed / hardened.


I have created a simple test script that leads to a kernel crash in HCI UART LDISC. Please note that this is a different scenario to the one documented in my previous E-mails but the resulting crash is similar.

I am using a BCSP supported Bluetooth Radio Module connected via a USB to serial interface to a 64-bit x86 Linux laptop.

I built kernel v4.10.5 although I messed up the .config settings so uname -r gave a strange string of4.10.54.10.5_debug+, sigh

Please note that "btattach" fails to establish a BCSP connection with the Bluetooth Radio Module but that is advantageous in exposing a fatal race condition in hci_uart_tty_close(). In this case, the BCSP Data Link protocol layer attempts to retransmit every 250ms and the BCSP layer gets closed down whilst still attempting to retransmit which causes a kernel crash.

My proposed patchset fixes this race condition plus some other issues.

Please refer to my attached tarball v4.10.5_bt_crash_info.tgz, the contents are:
v4.10.5_pure_testing/
v4.10.5_pure_testing/0001-Bluetooth-Debug-shows-how-hci_uart_tty_close-is-call.patch
v4.10.5_pure_testing/v4.10.5_pure_testing/btattach_loop.sh
v4.10.5_pure_testing/enable_hci_bt_logging.sh
v4.10.5_pure_testing/bt_hci_uart_ldisc_crash_snippet_log.txt

The patch file 0001-Bluetooth-Debug-shows-how-hci_uart_tty_close-is-call.patch is just adding debug so you can see that killing "btattach" causes do_group_exit() to run in the kernel which cleans-up by using tty_ldisc_kill() which calls tty_ldisc_close() which causes hci_uart_tty_close() to run.

enable_hci_bt_logging.sh
enables dynamic debug for dmesg although dynamic debug seems unreliable these days at least for me and I have to run enable_hci_bt_logging.sh multiple times to get all the files logging properly.

bt_hci_uart_ldisc_crash_snippet_log.txt
This shows my analysis of 2 example NULL pointer dereference kernel crashes which are easy to reproduce using my test script.

v4.10.5_pure_testing/btattach_loop.sh
This is the simple test script which causes a kernel crash within 10 minutes of running, it is easily repeatable with my equipment. I think the kernel crash occurs before the loop counter reaches 100.

The contents of the script is:

#!/bin/bash

let i=1

while [ true ]
do
        echo "loop $i"
        ./btattach -A /dev/ttyUSB0 -P bcsp &
        sleep 5
        echo "now killing..."
        killall btattach
        i=$(($i + 1))
done

Example kernel crash backtrace:

[ 1561.709737] BUG: unable to handle kernel NULL pointer dereference at 0000000000000044
[ 1561.709829] IP: _raw_spin_lock_irqsave+0x22/0x40
[ 1561.709862] PGD 0

[ 1561.709889] Oops: 0002 [#1] SMP
[ 1561.709911] Modules linked in: ftdi_sio rfcomm hci_uart btqca xt_set ip_set_hash_ip nf_log_ipv6 ip_set nf_log_ipv4 nf_log_common xt_LOG ip6table_nat nf_nat_ipv6 xt_recent iptable_nat nf_nat_ipv4 ipt_REJECT nf_reject_ipv4 iptable_mangle iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment ip6t_REJECT nf_reject_ipv6 xt_addrtype bridge stp llc xt_mark ip6table_mangle nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp xt_tcpudp nf_nat_sip xt_CT nf_nat_pptp nf_nat_proto_gre nf_nat_irc ip6table_raw nf_nat_h323 xt_multiport nf_nat_ftp nf_nat_amanda nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_nat nf_conntrack_tftp nf_conntrack_sip nf_conntrack_sane nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 [ 1561.710172] nf_conntrack_ftp ts_kmp nf_conntrack_amanda nf_conntrack iptable_filter ip_tables ip6table_filter ip6_tables x_tables lockd grace ctr ccm af_packet bnep msr usbserial uvcvideo btusb btrtl arc4 brcmsmac btbcm videobuf2_vmalloc btintel cordic videobuf2_memops bluetooth intel_powerclamp brcmutil videobuf2_v4l2 videobuf2_core mac80211 videodev coretemp kvm_intel kvm joydev media cfg80211 iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi snd_hda_codec_realtek psmouse snd_hda_codec_generic toshiba_acpi input_leds bcma snd_hda_intel irqbypass snd_hda_codec toshiba_bluetooth cpufreq_ondemand sparse_keymap crc32c_intel cpufreq_conservative industrialio snd_hda_core serio_raw wmi rfkill thermal cpufreq_powersave battery ac snd_hwdep snd_pcm acpi_cpufreq snd_timer snd toshiba_haps mei_me mei e1000e [ 1561.718028] fjes soundcore evdev ptp lpc_ich shpchp pps_core intel_ips tpm_tis tpm_tis_core nvram tpm sch_fq_codel sunrpc ipv6 crc_ccitt autofs4 hid_generic usbhid hid sdhci_pci mmc_block sdhci sr_mod ehci_pci ehci_hcd mmc_core usbcore usb_common i915 video button i2c_algo_bit drm_kms_helper drm [last unloaded: ftdi_sio] [ 1561.721317] CPU: 1 PID: 4473 Comm: kworker/1:0 Not tainted 4.10.54.10.5_debug+ #12 [ 1561.722981] Hardware name: TOSHIBA Satellite R630/Portable PC, BIOS Version 1.40 07/20/2010
[ 1561.724660] Workqueue: events hci_uart_write_work [hci_uart]
[ 1561.726377] task: ffff98d9b4f15100 task.stack: ffffa868c1178000
[ 1561.728568] RIP: 0010:_raw_spin_lock_irqsave+0x22/0x40
[ 1561.730355] RSP: 0000:ffffa868c117bda8 EFLAGS: 00010046
[ 1561.731993] RAX: 0000000000000000 RBX: 0000000000000296 RCX: 0000000000079ad6 [ 1561.733646] RDX: 0000000000000001 RSI: ffff98da77c5c580 RDI: 0000000000000044 [ 1561.735314] RBP: ffffa868c117bdb0 R08: 000000000001c5a0 R09: ffffffff9965a54a [ 1561.736991] R10: fffff48441dc9a80 R11: ffff98da73403700 R12: 0000000000000030 [ 1561.738641] R13: 0000000000000044 R14: 0000000000000030 R15: ffff98d9b4e50000 [ 1561.740288] FS: 0000000000000000(0000) GS:ffff98da77c40000(0000) knlGS:0000000000000000
[ 1561.741940] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1561.743585] CR2: 0000000000000044 CR3: 0000000128d1a000 CR4: 00000000000006e0
[ 1561.745238] Call Trace:
[ 1561.746882]  skb_dequeue+0x1d/0x70
[ 1561.748512]  bcsp_dequeue+0x27/0x180 [hci_uart]
[ 1561.750130]  ? kfree_skbmem+0x5a/0x60
[ 1561.751742]  hci_uart_write_work+0xc4/0x100 [hci_uart]
[ 1561.753356]  process_one_work+0x151/0x410
[ 1561.754930]  worker_thread+0x69/0x4b0
[ 1561.756507]  kthread+0x114/0x150
[ 1561.758076]  ? rescuer_thread+0x340/0x340
[ 1561.759636]  ? kthread_park+0x90/0x90
[ 1561.761191]  ret_from_fork+0x2c/0x40
[ 1561.762693] Code: 89 e5 e8 82 96 98 ff 5d c3 66 66 66 66 90 55 48 89 e5 53 9c 58 66 66 90 66 90 48 89 c3 fa 66 66 90 66 66 90 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 06 48 89 d8 5b 5d c3 89 c6 e8 29 83 98 ff
[ 1561.766034] RIP: _raw_spin_lock_irqsave+0x22/0x40 RSP: ffffa868c117bda8
[ 1561.768033] CR2: 0000000000000044
[ 1561.774555] ---[ end trace 51cc1d3575c0e559 ]---

Please see my analysis in bt_hci_uart_ldisc_crash_snippet_log.txt

With my patchset applied, this crash appears to no longer occur.

If I manage to produce some more useful results then I will post them.

Thanks,

Best regards,
Dean

--
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.

Attachment: v4.10.5_bt_crash_info.tgz
Description: application/compressed-tar


[Index of Archives]     [Bluez Devel]     [Linux Wireless Networking]     [Linux Wireless Personal Area Networking]     [Linux ATH6KL]     [Linux USB Devel]     [Linux Media Drivers]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Big List of Linux Books]

  Powered by Linux