gadget serial oops in gs_write_room when data rx'd after close

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Felipe,

I've found a problem in the gadget serial driver, which I believe is triggered in the following scenario:

1) Both host and gadget system have a USB gadget serial port open.

2) Gadget side application closes the serial port.

3) Host side sends data to the serial port.

(Steps 2, 3 may need to happen at almost the same time and/or perhaps in the opposite order)

4) Gadget side passes the received data to the TTY layer even though the port is closed locally, which soon results in an oops in gs_write_room since port is NULL in the following:

	struct gs_port	*port = tty->driver_data;
...
	spin_lock_irqsave(&port->port_lock, flags);


The exact oops I see is:

[  132.757406] Unable to handle kernel NULL pointer dereference at virtual address 00000170
[  132.757409] pgd = ffffffc001481000
[  132.757414] [00000170] *pgd=000000026cdd6003, *pud=000000026cdd6003, *pmd=000000026cdd7003, *pte=00e8000003881707
[  132.757418] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[  132.757422] Modules linked in: bcmdhd pci_tegra bluedroid_pm
[  132.757428] CPU: 5 PID: 53 Comm: kworker/u12:1 Not tainted 4.4.38-tegra #1
[  132.757429] Hardware name: quill (DT)
[  132.757439] Workqueue: events_unbound flush_to_ldisc
[  132.757441] task: ffffffc1eb660c80 ti: ffffffc1eb6a8000 task.ti: ffffffc1eb6a8000
[  132.757447] PC is at _raw_spin_lock_irqsave+0x20/0x54
[  132.757452] LR is at gs_write_room+0x1c/0x7c
[  132.757453] pc : [<ffffffc000b2bb70>] lr : [<ffffffc000722fd4>] pstate: 800000c5
[  132.757454] sp : ffffffc1eb6abbe0
[ 132.757456] x29: ffffffc1eb6abbe0 x28: ffffffc07aea9000 [ 132.757458] x27: 0000000000000003 x26: ffffff8015209000 [ 132.757460] x25: 00000000ffffffff x24: ffffff8015209000 [ 132.757462] x23: 0000000000000000 x22: ffffffc07aea9000 [ 132.757463] x21: ffffffc1db1cf820 x20: 0000000000000170 [ 132.757465] x19: 0000000000000000 x18: 00000000004a4000 [ 132.757466] x17: 00000000bc7817d9 x16: ffffffc000b37a60 [ 132.757468] x15: ffffffc000b37a60 x14: 00000000000007a1 [ 132.757470] x13: ffffffc1db1cf823 x12: 0000000000012a38 [ 132.757471] x11: ffffffc000b379d0 x10: 000000000000000d [ 132.757473] x9 : ffffffc1eb6abd20 x8 : ffffffc1db1cf823 [ 132.757474] x7 : 0000000000000000 x6 : 0000000000000000 [ 132.757476] x5 : ffffffc07aea9238 x4 : 0000000000000001 [ 132.757477] x3 : 0000000000000001 x2 : 0000000000000040 [ 132.757479] x1 : ffffffc1eb6a8000 x0 : 0000000000000170 [ 132.757479] [ 132.757481] Process kworker/u12:1 (pid: 53, stack limit = 0xffffffc1eb6a8020)
[  132.757482] Call trace:
[  132.757484] [<ffffffc000b2bb70>] _raw_spin_lock_irqsave+0x20/0x54
[  132.757487] [<ffffffc000466af4>] tty_write_room+0x1c/0x2c
[  132.757489] [<ffffffc0004634a0>] __process_echoes+0x24/0x25c
[  132.757491] [<ffffffc000466208>] n_tty_receive_buf_common+0x1ec/0xa24
[  132.757493] [<ffffffc000466a50>] n_tty_receive_buf2+0x10/0x18
[  132.757495] [<ffffffc000469554>] flush_to_ldisc+0xe4/0x154
[  132.757499] [<ffffffc0000bb9d4>] process_one_work+0x154/0x434
[  132.757501] [<ffffffc0000bbde8>] worker_thread+0x134/0x40c
[  132.757503] [<ffffffc0000c1678>] kthread+0xe0/0xf4
[  132.757506] [<ffffffc000084f90>] ret_from_fork+0x10/0x40
[  132.757508] ---[ end trace 5f1b8572ec9b9a44 ]---
[  132.758707] note: kworker/u12:1[53] exited with preempt_count 1

The patch below fixes this in my testing:

diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
index 9b0805f55ad7..16bb24a047d9 100644
--- a/drivers/usb/gadget/function/u_serial.c
+++ b/drivers/usb/gadget/function/u_serial.c
@@ -537,7 +537,7 @@ static void gs_rx_push(unsigned long _port)
 		}
/* push data to (open) tty */
-		if (req->actual) {
+		if (req->actual & tty) {
 			char		*packet = req->buf;
 			unsigned	size = req->actual;
 			unsigned	n;

However, I'm not sure if that patch is entirely complete/correct. Can you comment? In particular:

1) It causes any data that's received while the port is closed to be simply thrown away. I assume this is expected behaviour, as opposed to e.g. buffering the data up until the port is opened later. I note that gs_rx_push does buffer data in the tty_throttled() case, but that's a different case than port closed.

2) gs_rx_push() still calls gs_start_rx() even if there's no TTY attached to the gadget serial device. I wonder if it should stop submitting USB requests until the port is open again? Perhaps the current behaviour is correct since the host will be upset if the gadget refuses to read the data it's sending.

3) I'm not convinced that the patch above completely closes the race that triggers the oops. It should fully solve the issue if the steps I wrote at the very top of this email occur in order, but I think there's still a possible race if the set of steps below occur. Should gs_close() (or some related/child function) do something to synchronously drain the RX queue or RX line discipline, just like it already does to drain port_write_buf?

Possible remaining race window:

1) Both host and gadget system have a USB gadget serial port open.

2) Host side sends data to the serial port.

3) gs_rx_push() processes the received data, and passes it on to tty_insert_flip_string() and tty_flip_buffer_push(). I believe these are asynchronous APIs which simply store the data for later processing.

3) Gadget side application closes the serial port. I believe this sets tty->driver_data to NULL.

4) The TTY work queue entry written in (3) is executed, and the TTY/line-discipline code processes the received data, which again ends up calling gs_write_room() and triggering the same oops there.

Some additional notes on the setup that I used to reproduce the issue:

* Using NVIDIA's downstream Tegra kernel 4.4. While this does make the problem not-an-upstream-issue, the code in the gadget serial driver looks identical to upstream (v4.12) w.r.t. this issue, so although I haven't actually reproduced the issue with upstream SW, I'm fairly confident that the issue does exist there and could be reproduced. Upstream doesn't support the Tegra UDC device yet, so I can't test the exact same HW upstream yet.

* Host machine is a pretty fast/recent core i7 running Ubuntu 14.04 x86-64.

* Gadget machine is an NVIDIA Tegra (Jetson TX2; T186 SoC) running Ubuntu 14.04 AArch64.

* I found the quickest way to reproduce the issue is to run the following once the gadget serial connection has been set up via /sys/kernel/config/usb_gadget:

Tegra-side, so the port keeps getting opened/closed:
while true; do sudo service serial-getty@ttyGS0 restart; sleep 1; done
(Rather than setting up a systemd getty service, you can probably also just run agetty directly and then kill it in a loop.)

Host-side, so data keeps coming in:
while true; do cat some-small-multiline-text-file > /dev/ttyACM0; sleep 1; done

Repro typically takes a second or two at most, whereas with the patch above the device has run for at least 40m without issue.

If the patch does look good, just let me know and I'll send it in the normal manner.
--
To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [ARM Kernel]     [Linux ARM]     [Linux ARM MSM]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux