On 05/22/2015, 10:02 PM, Elliott, Robert (Server Storage) wrote: >> -----Original Message----- >> From: Peter Hurley [mailto:peter@xxxxxxxxxxxxxxxxxx] >> Sent: Wednesday, May 20, 2015 5:18 AM >> To: Elliott, Robert (Server Storage) >> Cc: gregkh@xxxxxxxxxxxxxxxxxxx; jslaby@xxxxxxx; linux-serial@xxxxxxxxxxxxxxx >> Subject: Re: uart_start NULL pointer dereference >> >> On 05/19/2015 08:10 PM, Elliott, Robert (Server Storage) wrote: >>> I accidentally pasted a huge bunch of text to the linux serial port and >>> triggered a NULL pointer dereference in the kernel (4.0). I have not >>> tried to replicate it again. >> >> Thanks for the report, Robert. Would you please clarify some details? >> >> (for the sake of discussion, let's refer to the HP Proliant reported below >> as the local host) >> >> I'm assuming you pasted on the remote, correct? >> Was the paste at a login prompt, or did the paste kill the session and >> getty respawned? > > Yes, it was at a login prompt. > > It essentially killed the whole system. ssh over the network gave > login and password prompts, printed: > Last login: Fri May 15 17:26:45 2015 from 16.100.201.84 > > and then hung. > > The serial port was unresponsive, and I didn't have any > already open ssh network sessions to use. I don't recall if I > checked the GUI was still usable; I just reset the machine > through the iLO management interface. > > The serial port being used here is the iLO virtual serial port. > >> Please attach the entire log (at least from boot). > > Sorry, I wasn't saving the serial log to a file on that boot; the > messages were just cut-and-paste from the serial window. > > Good news: I tried something similar again (this time just > pasting some of the previous serial log) and triggered a > similar error on my first attempt. This time uart_put_char > is upset. > > Again, the serial port became unresponsive. This time, > I had an existing ssh network connection open, and it was > still fine. It accepted another ssh connection. After > about 4 minutes, though, everything on the system started > to run very slowly - vim and who took about 2 minutes to > start up; then vim hung, and running who a second time > hung, and ssh connections are refused. The GUI is > unresponsive; sitting at the CentOS login screen showing > the time, it's currently 18 minutes behind. > > > [76330.588297] BUG: unable to handle kernel NULL pointer dereference at 00000000000000c0 > [76330.592422] IP: [<ffffffff81383d5a>] uart_put_char+0x7a/0xa0 > [76330.595342] PGD 0 > [76330.596512] Oops: 0002 [#1] SMP > [76330.598263] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_mark iptable_filter ip_tables bridge stp llc vfat fat x86_pkg_temp_thermal coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ioatdma iTCO_wdt iTCO_vendor_support dca lpc_ich sb_edac nd_btt microcode hpwdt nd_pmem i2c_i801 xhci_pci hpilo edac_core pcspkr xhci_hcd mfd_core nd_acpi pcc_cpufreq libnd wmi acpi_cpufreq shpchp nfsd auth_rpcgss nfs_acl lockd grace uinput sunrpc xfs exportfs sr_mod cdrom sd_mod bnx2x ahci tg3 libahci mdio ptp pps_core hpsa libcrc32c dm_mirror dm_region_hash dm_log dm_mod ipv6 autofs4 efivarfs > [76330.631435] CPU: 15 PID: 62 Comm: kworker/u82:0 Not tainted 4.0.0+ #49 > [76330.634533] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 04/30/2015 > [76330.637816] Workqueue: events_unbound flush_to_ldisc > [76330.640165] task: ffff88106d919c80 ti: ffff88106db2c000 task.ti: ffff88106db2c000 > [76330.643684] RIP: 0010:[<ffffffff81383d5a>] [<ffffffff81383d5a>] uart_put_char+0x7a/0xa0 > [76330.647671] RSP: 0018:ffff88106db2fc28 EFLAGS: 00010006 > [76330.650220] RAX: 0000000000000246 RBX: ffff88046945c198 RCX: 00000000000000c0 > [76330.653593] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffffffff8202e6f8 > [76330.657147] RBP: ffff88106db2fc48 R08: ffffc900078f0000 R09: 00000000ffffffff > [76330.660524] R10: 0000000000000004 R11: 0000000000000043 R12: ffffffff8202e6f8 > [76330.663952] R13: 0000000000000020 R14: 0000000000000001 R15: ffffc900078f0000 > [76330.667464] FS: 0000000000000000(0000) GS:ffff88107f940000(0000) knlGS:0000000000000000 > [76330.671341] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [76330.674103] CR2: 00000000000000c0 CR3: 0000000001a0b000 CR4: 00000000001407e0 > [76330.677663] Stack: > [76330.678622] 0000000000f3f ffff880468cb5000 0000000000000081 00000000000000c5 > [76330.782262] ffff88106db2fc68 ffffffff81364bf4 00000000000000bf ffffc920078f0000 > [76330.785827] ffff88106db2fc88 ffffffff813697c8 ffff88106db2fc88 0000000000000f3f > [76330.789452] Call Trace: > [76330.790612] [<ffffffff81364bf4>] tty_put_char+0x24/0x40 > [76330.793145] [<ffffffff813697c8>] do_output_char+0xb8/0x230 > [76330.795812] [<ffffffff81369a9e>] __process_echoes+0x15e/0x2a0 > [76330.798705] [<ffffffff8136cf96>] n_tty_receive_buf_common+0x606/0xba0 > [76330.801822] [<ffffffff811463a1>] ? bdi_dirty_limit+0x31/0xc0 > [76330.804549] [<ffffffff8136d544>] n_tty_receive_buf2+0x14/0x20 > [76330.807481] [<ffffffff813700bd>] flush_to_ldisc+0xdd/0x120 > [76330.810164] [<ffffffff8106c292>] process_one_work+0x142/0x3f0 > [76330.812977] [<ffffffff8106c65b>] worker_thread+0x11b/0x460 > [76330.815664] [<ffffffff8106c540>] ? process_one_work+0x3f0/0x3f0 > [76330.818633] [<ffffffff81071b49>] kthread+0xc9/0xe0 > [76330.820974] [<ffffffff81071a80>] ? kthread_create_on_node+0x170/0x170 > [76330.824324] [<ffffffff815b7b52>] ret_from_fork+0x42/0x70 > [76330.827020] [<ffffffff81071a80>] ? kthread_create_on_node+0x170/0x170 > [76330.830418] Code: 75 1f 48 89 c6 4c 89 e7 e8 34 34 23 00 44 89 f0 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 84 00 00 00 00 00 48 8b 93 80 01 00 00 41 b6 01 <44> 88 2c 0a 8b b3 88 01 00 00 8d 56 01 81 e2 ff 0f 00 00 89 93 > [76330.841499] RIP [<ffffffff81383d5a>] uart_put_char+0x7a/0xa0 > [76330.844333] RSP <ffff88106db2fc28> > [76330.846211] CR2: 00000000000000c0 If I am looking correctly: 18: c3 retq 19: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 20: 00 21: 48 8b 93 80 01 00 00 mov 0x180(%rbx),%rdx 28: 41 b6 01 mov $0x1,%r14b 2b:* 44 88 2c 0a mov %r13b,(%rdx,%rcx,1) ^^^^^^^^^^^^^^^^^^^ trapping instruction 2f: 8b b3 88 01 00 00 mov 0x188(%rbx),%esi 35: 8d 56 01 lea 0x1(%rsi),%edx 38: 81 e2 ff 0f 00 00 and $0xfff,%edx 3e: 89 .byte 0x89 3f: 93 xchg %eax,%ebx This means circ->buf is NULL and circ->head is 0xc0. It looks like a race between put_char (or write) and close (or TIOCSSERIAL or TIOCSERCONFIG). Hmm, circ->buf is protected by tty_port->mutex, but this is not (and cannot be) taken as put_char can be called from timer/irq context too. The circ->buf should ideally be freed in tty_ops->shutdown, not ->close. And not freed in "uart_set_info -> uart_shutdown" path at all. Alternatively, we can free circ->buf under the uart_port->lock spinlock and move "if (!circ->buf)" check to the critical sections in write and put_char. thanks, -- js suse labs -- To unsubscribe from this list: send the line "unsubscribe linux-serial" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html