Re: [PATCH] tty: serial: qcom_geni_serial: Fix RX cancel command failure

skakit@xxxxxxxxxxxxxx · Mon, 24 Feb 2020 20:17:23 +0530

On 2020-02-19 23:20, Stephen Boyd wrote:
Quoting skakit@xxxxxxxxxxxxxx (2020-02-14 05:17:01)
On 2020-02-13 04:19, Stephen Boyd wrote:
>     driver_probe_device+0x70/0x140
>     __driver_attach_async_helper+0x7c/0xa8
>     async_run_entry_fn+0x60/0x178
>     process_one_work+0x33c/0x640
>     worker_thread+0x2a0/0x470
>     kthread+0x128/0x138
>     ret_from_fork+0x10/0x18
>    Code: 1aca096a 911e0129 b940012b 7100054a (b800450b)
I think the most probable explanation of the crash is, set_termios 
call
is starting RX engine and RX engine is sampling some garbage data from
line, and by the time startup is called, we have few data to read.
How frequently you are able to see this crash? because internally we 
are
unable to reproduce it.

How is set_termios involved? Is that starting the RX side before
uart_startup() is called? Sorry I haven't looked into the code flow 
very
deeply here.

It seems to happen when the bluetooth driver probes so maybe constantly
adding and removing the hci_uart module will cause this to happen for
you? I also run the kernel with many debug options enabled, so maybe 
try
enabling all the debug stuff? I see it randomly right now so I'm not
sure.

In general uart_startup() is called before set_termios() ,but as per the 
crash logs shared, it seems RX engine is active(which can only happen 
from set_termios) before startup() is called.

>
>
> This seems to be the problematic line. We didn't call handle_rx() from
> the stop_rx() path before. And this qcom_geni_serial_stop_rx() function
> is called from qcom_geni_serial_startup(), but most importantly, we
> call
> into this function from startup before we allocate memory for the
> port->rx_fifo member (see the devm_kcalloc() later in
> qcom_geni_serial_port_setup() and see how it's after we stop rx).
>
> Why do we need to flush the rx buffer by reading it into the software
> buffer? Can't we simply ack any outstanding RX interrupts in the
> hardware when we're stopping receive?
We can't simply ack RX_LAST interrupt, there is a sticky bit that get
set on HW level(not exposed to SW) with RX last interrupt. The only 
way
to clear it is flush out RX FIFO HW buffer. The sticky bit can create
problem for future transfers if remained uncleared.
How about we allocate buffer to port->rx_fifo in probe itself?

Ok. If we have to read the rx fifo to flush the buffer then perhaps
write another function that qcom_geni_serial_stop_rx() can use to
indicate that it wants to throw away whatever is in the rx fifo? Or
adjust handle_rx() to check for a NULL fifo pointer and throw it away 
in
that case? When we're setting up this device I don't understand why we
would want to read anything out of the rx fifo that was there before 
the
driver started.

We cannot adjust handle_rx() to check for a NULL fifo pointer as reading 
from RX FIFO is mandatory to clear the sticky bit set.  We are passing 
"true" to handle_rx function so it will read and discard whatever data 
present in RX FIFO, it won't send to upper layers. We are planning to 
post a patch which has rx_fifo buffer allocated in probe itself, in 
order to avoid the NULL dereference.