On 2020-02-13 04:19, Stephen Boyd wrote:
Quoting satya priya (2020-02-11 02:13:02)
RX cancel command fails when BT is switched on and off multiple times.
To handle this, poll for the cancel bit in SE_GENI_S_IRQ_STATUS
register
instead of SE_GENI_S_CMD_CTRL_REG.
As per the HPG update, handle the RX last bit after cancel command
and flush out the RX FIFO buffer.
Signed-off-by: satya priya <skakit@xxxxxxxxxxxxxx>
---
I'm pretty sure I saw an oops with this patch.
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000000
Mem abort info:
ESR = 0x96000046
EC = 0x25: DABT (current EL), IL = 32 bits
Bluetooth: hci_ldisc.c:hci_uart_register_proto() HCI UART protocol
QCA registered
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000046
CM = 0, WnR = 1
user pgtable: 4k pages, 39-bit VAs, pgdp=00000001b4645000
[0000000000000000] pgd=00000001ac579003, pud=00000001ac579003,
pmd=0000000000000000
Internal error: Oops: 96000046 [#1] PREEMPT SMP
Modules linked in: hci_uart(+) qcom_spmi_temp_alarm btqca
qcom_spmi_adc5 qcom_vadc_common bluetooth ecdh_generic ecc venus_core
v4l2_mem2mem videobuf2_v4l2 videobuf2_common ipa qcom_q6v5_mss
qcom_q6v5_ipa_notify qcom_common qcom_q6v5 xt_MASQUERADE fuse
rmtfs_mem iio_trig_sysfs cros_ec_sensors_ring cros_ec_sensors
cros_ec_sensors_core industrialio_triggered_buffer kfifo_buf
cros_ec_sensorsupport ath10k_snoc qmi_helpers ath10k_core ath lzo_rle
lzo_compress mac80211 zram cfg80211 joydev
CPU: 7 PID: 258 Comm: kworker/u16:3 Tainted: G S W
5.4.17 #19
Workqueue: events_unbound async_run_entry_fn
pstate: 60c00009 (nZCv daif +PAN +UAO)
pc : handle_rx_uart+0x64/0x278
lr : qcom_geni_serial_handle_rx+0x84/0x90
sp : ffffff814348f960
x29: ffffff814348f960 x28: ffffffd01ac24288
x27: 0000000000000018 x26: 0000000000000002
x25: 0000000000000001 x24: ffffff8146341348
x23: ffffff8146341000 x22: ffffffd01accc978
x21: ffffff8146341000 x20: 0000000000000001
x19: 0000000000000001 x18: ffffffd01b22d000
x17: 0000000000008000 x16: 00000000000000b0
x15: ffffffd01afdbdd0 x14: ffffffd01b3edde0
x13: ffffffd01b7fb000 x12: 0000000000000001
x11: 0000000000000000 x10: 0000000000000000
x9 : ffffffd010344780 x8 : 0000000000000000
x7 : ffffffd019d8e768 x6 : 0000000000000000
x5 : ffffffd01adbb000 x4 : 0000000000008004
x3 : 0000000000000001 x2 : 0000000000000001
x1 : 0000000000000001 x0 : ffffffd01accc978
Call trace:
handle_rx_uart+0x64/0x278
qcom_geni_serial_handle_rx+0x84/0x90
qcom_geni_serial_stop_rx+0x110/0x180
qcom_geni_serial_port_setup+0x68/0x1b0
qcom_geni_serial_startup+0x24/0x70
uart_startup+0x164/0x28c
uart_port_activate+0x6c/0xbc
tty_port_open+0xa8/0x114
uart_open+0x28/0x38
ttyport_open+0x7c/0x164
serdev_device_open+0x38/0xe4
hci_uart_register_device+0x54/0x2e8 [hci_uart]
qca_serdev_probe+0x1c4/0x374 [hci_uart]
serdev_drv_probe+0x3c/0x64
really_probe+0x144/0x3f8
driver_probe_device+0x70/0x140
__driver_attach_async_helper+0x7c/0xa8
async_run_entry_fn+0x60/0x178
process_one_work+0x33c/0x640
worker_thread+0x2a0/0x470
kthread+0x128/0x138
ret_from_fork+0x10/0x18
Code: 1aca096a 911e0129 b940012b 7100054a (b800450b)
I think the most probable explanation of the crash is, set_termios call
is starting RX engine and RX engine is sampling some garbage data from
line, and by the time startup is called, we have few data to read.
How frequently you are able to see this crash? because internally we are
unable to reproduce it.
diff --git a/drivers/tty/serial/qcom_geni_serial.c
b/drivers/tty/serial/qcom_geni_serial.c
index 191abb1..0bd1684 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -599,7 +600,7 @@ static void qcom_geni_serial_stop_rx(struct
uart_port *uport)
u32 irq_en;
u32 status;
struct qcom_geni_serial_port *port = to_dev_port(uport,
uport);
- u32 irq_clear = S_CMD_DONE_EN;
+ u32 s_irq_status;
irq_en = readl(uport->membase + SE_GENI_S_IRQ_EN);
irq_en &= ~(S_RX_FIFO_WATERMARK_EN | S_RX_FIFO_LAST_EN);
@@ -615,10 +616,19 @@ static void qcom_geni_serial_stop_rx(struct
uart_port *uport)
return;
geni_se_cancel_s_cmd(&port->se);
- qcom_geni_serial_poll_bit(uport, SE_GENI_S_CMD_CTRL_REG,
- S_GENI_CMD_CANCEL, false);
+ qcom_geni_serial_poll_bit(uport, SE_GENI_S_IRQ_STATUS,
+ S_CMD_CANCEL_EN, true);
+ /*
+ * If timeout occurs secondary engine remains active
+ * and Abort sequence is executed.
+ */
+ s_irq_status = readl(uport->membase + SE_GENI_S_IRQ_STATUS);
+ /* Flush the Rx buffer */
+ if (s_irq_status & S_RX_FIFO_LAST_EN)
+ qcom_geni_serial_handle_rx(uport, true);
This seems to be the problematic line. We didn't call handle_rx() from
the stop_rx() path before. And this qcom_geni_serial_stop_rx() function
is called from qcom_geni_serial_startup(), but most importantly, we
call
into this function from startup before we allocate memory for the
port->rx_fifo member (see the devm_kcalloc() later in
qcom_geni_serial_port_setup() and see how it's after we stop rx).
Why do we need to flush the rx buffer by reading it into the software
buffer? Can't we simply ack any outstanding RX interrupts in the
hardware when we're stopping receive?
We can't simply ack RX_LAST interrupt, there is a sticky bit that get
set on HW level(not exposed to SW) with RX last interrupt. The only way
to clear it is flush out RX FIFO HW buffer. The sticky bit can create
problem for future transfers if remained uncleared.
How about we allocate buffer to port->rx_fifo in probe itself?
+ writel(s_irq_status, uport->membase + SE_GENI_S_IRQ_CLEAR);
+
status = readl(uport->membase + SE_GENI_STATUS);
- writel(irq_clear, uport->membase + SE_GENI_S_IRQ_CLEAR);
if (status & S_GENI_CMD_ACTIVE)
qcom_geni_serial_abort_rx(uport);