Re: uart_start NULL pointer dereference

Jiri Slaby <jslaby@xxxxxxx> · Mon, 25 May 2015 12:01:48 +0200

On 05/22/2015, 10:02 PM, Elliott, Robert (Server Storage) wrote:
>> -----Original Message-----
>> From: Peter Hurley [mailto:peter@xxxxxxxxxxxxxxxxxx]
>> Sent: Wednesday, May 20, 2015 5:18 AM
>> To: Elliott, Robert (Server Storage)
>> Cc: gregkh@xxxxxxxxxxxxxxxxxxx; jslaby@xxxxxxx; linux-serial@xxxxxxxxxxxxxxx
>> Subject: Re: uart_start NULL pointer dereference
>>
>> On 05/19/2015 08:10 PM, Elliott, Robert (Server Storage) wrote:
>>> I accidentally pasted a huge bunch of text to the linux serial port and
>>> triggered a NULL pointer dereference in the kernel (4.0).  I have not
>>> tried to replicate it again.
>>
>> Thanks for the report, Robert. Would you please clarify some details?
>>
>> (for the sake of discussion, let's refer to the HP Proliant reported below
>> as the local host)
>>
>> I'm assuming you pasted on the remote, correct?
>> Was the paste at a login prompt, or did the paste kill the session and
>> getty respawned?
> 
> Yes, it was at a login prompt.
> 
> It essentially killed the whole system.  ssh over the network gave 
> login and password prompts, printed:
> Last login: Fri May 15 17:26:45 2015 from 16.100.201.84
> 
> and then hung.
> 
> The serial port was unresponsive, and I didn't have any 
> already open ssh network sessions to use. I don't recall if I
> checked the GUI was still usable; I just reset the machine 
> through the iLO management interface.
> 
> The serial port being used here is the iLO virtual serial port.
> 
>> Please attach the entire log (at least from boot).
> 
> Sorry, I wasn't saving the serial log to a file on that boot; the
> messages were just cut-and-paste from the serial window. 
> 
> Good news: I tried something similar again (this time just 
> pasting some of the previous serial log) and triggered a 
> similar error on my first attempt.  This time uart_put_char
> is upset.
> 
> Again, the serial port became unresponsive.  This time,
> I had an existing ssh network connection open, and it was
> still fine.  It accepted another ssh connection.  After 
> about 4 minutes, though, everything on the system started
> to run very slowly - vim and who took about 2 minutes to
> start up; then vim hung, and running who a second time
> hung, and ssh connections are refused.  The GUI is
> unresponsive; sitting at the CentOS login screen showing
> the time, it's currently 18 minutes behind.
> 
> 
> [76330.588297] BUG: unable to handle kernel NULL pointer dereference at 00000000000000c0
> [76330.592422] IP: [<ffffffff81383d5a>] uart_put_char+0x7a/0xa0
> [76330.595342] PGD 0
> [76330.596512] Oops: 0002 [#1] SMP
> [76330.598263] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack xt_mark iptable_filter ip_tables bridge stp llc vfat fat x86_pkg_temp_thermal coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd ioatdma iTCO_wdt iTCO_vendor_support dca lpc_ich sb_edac nd_btt microcode hpwdt nd_pmem i2c_i801 xhci_pci hpilo edac_core pcspkr xhci_hcd mfd_core nd_acpi pcc_cpufreq libnd wmi acpi_cpufreq shpchp nfsd auth_rpcgss nfs_acl lockd grace uinput sunrpc xfs exportfs sr_mod cdrom sd_mod bnx2x ahci tg3 libahci mdio ptp pps_core hpsa libcrc32c dm_mirror dm_region_hash dm_log dm_mod ipv6 autofs4 efivarfs
> [76330.631435] CPU: 15 PID: 62 Comm: kworker/u82:0 Not tainted 4.0.0+ #49
> [76330.634533] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 04/30/2015
> [76330.637816] Workqueue: events_unbound flush_to_ldisc
> [76330.640165] task: ffff88106d919c80 ti: ffff88106db2c000 task.ti: ffff88106db2c000
> [76330.643684] RIP: 0010:[<ffffffff81383d5a>]  [<ffffffff81383d5a>] uart_put_char+0x7a/0xa0
> [76330.647671] RSP: 0018:ffff88106db2fc28  EFLAGS: 00010006
> [76330.650220] RAX: 0000000000000246 RBX: ffff88046945c198 RCX: 00000000000000c0
> [76330.653593] RDX: 0000000000000000 RSI: 0000000000000020 RDI: ffffffff8202e6f8
> [76330.657147] RBP: ffff88106db2fc48 R08: ffffc900078f0000 R09: 00000000ffffffff
> [76330.660524] R10: 0000000000000004 R11: 0000000000000043 R12: ffffffff8202e6f8
> [76330.663952] R13: 0000000000000020 R14: 0000000000000001 R15: ffffc900078f0000
> [76330.667464] FS:  0000000000000000(0000) GS:ffff88107f940000(0000) knlGS:0000000000000000
> [76330.671341] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [76330.674103] CR2: 00000000000000c0 CR3: 0000000001a0b000 CR4: 00000000001407e0
> [76330.677663] Stack:
> [76330.678622]  0000000000f3f ffff880468cb5000 0000000000000081 00000000000000c5
> [76330.782262]  ffff88106db2fc68 ffffffff81364bf4 00000000000000bf ffffc920078f0000
> [76330.785827]  ffff88106db2fc88 ffffffff813697c8 ffff88106db2fc88 0000000000000f3f
> [76330.789452] Call Trace:
> [76330.790612]  [<ffffffff81364bf4>] tty_put_char+0x24/0x40
> [76330.793145]  [<ffffffff813697c8>] do_output_char+0xb8/0x230
> [76330.795812]  [<ffffffff81369a9e>] __process_echoes+0x15e/0x2a0
> [76330.798705]  [<ffffffff8136cf96>] n_tty_receive_buf_common+0x606/0xba0
> [76330.801822]  [<ffffffff811463a1>] ? bdi_dirty_limit+0x31/0xc0
> [76330.804549]  [<ffffffff8136d544>] n_tty_receive_buf2+0x14/0x20
> [76330.807481]  [<ffffffff813700bd>] flush_to_ldisc+0xdd/0x120
> [76330.810164]  [<ffffffff8106c292>] process_one_work+0x142/0x3f0
> [76330.812977]  [<ffffffff8106c65b>] worker_thread+0x11b/0x460
> [76330.815664]  [<ffffffff8106c540>] ? process_one_work+0x3f0/0x3f0
> [76330.818633]  [<ffffffff81071b49>] kthread+0xc9/0xe0
> [76330.820974]  [<ffffffff81071a80>] ? kthread_create_on_node+0x170/0x170
> [76330.824324]  [<ffffffff815b7b52>] ret_from_fork+0x42/0x70
> [76330.827020]  [<ffffffff81071a80>] ? kthread_create_on_node+0x170/0x170
> [76330.830418] Code: 75 1f 48 89 c6 4c 89 e7 e8 34 34 23 00 44 89 f0 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 84 00 00 00 00 00 48 8b 93 80 01 00 00 41 b6 01 <44> 88 2c 0a 8b b3 88 01 00 00 8d 56 01 81 e2 ff 0f 00 00 89 93
> [76330.841499] RIP  [<ffffffff81383d5a>] uart_put_char+0x7a/0xa0
> [76330.844333]  RSP <ffff88106db2fc28>
> [76330.846211] CR2: 00000000000000c0

If I am looking correctly:
  18:   c3                      retq
  19:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  20:   00
  21:   48 8b 93 80 01 00 00    mov    0x180(%rbx),%rdx
  28:   41 b6 01                mov    $0x1,%r14b
  2b:*  44 88 2c 0a             mov    %r13b,(%rdx,%rcx,1)
    ^^^^^^^^^^^^^^^^^^^ trapping instruction
  2f:   8b b3 88 01 00 00       mov    0x188(%rbx),%esi
  35:   8d 56 01                lea    0x1(%rsi),%edx
  38:   81 e2 ff 0f 00 00       and    $0xfff,%edx
  3e:   89                      .byte 0x89
  3f:   93                      xchg   %eax,%ebx

This means circ->buf is NULL and circ->head is 0xc0. It looks like a
race between put_char (or write) and close (or TIOCSSERIAL or
TIOCSERCONFIG). Hmm, circ->buf is protected by tty_port->mutex, but this
is not (and cannot be) taken as put_char can be called from timer/irq
context too.

The circ->buf should ideally be freed in tty_ops->shutdown, not ->close.
And not freed in "uart_set_info -> uart_shutdown" path at all.

Alternatively, we can free circ->buf under the uart_port->lock spinlock
and move "if (!circ->buf)" check to the critical sections in write and
put_char.

thanks,
-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-serial" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html