On Tue, 24 May 2022, cael wrote: > We have met a hang on pty device, the reader was blocking at > epoll on master side, the writer was sleeping at wait_woken inside > n_tty_write on slave side ,and the write buffer on tty_port was full, we Space after comma. It would be also useful to tone down usage of "we" in the changelog. > found that the reader and writer would never be woken again and block > forever. > > We thought the problem was caused as a race between reader and > kworker as follows: > n_tty_read(reader)| n_tty_receive_buf_common(kworker) > |room = N_TTY_BUF_SIZE - (ldata->read_head - tail) > |room <= 0 > copy_from_read_buf| > n_tty_kick_worker | > |ldata->no_room = true > > After writing to slave device, writer wakes up kworker to flush > data on tty_port to reader, and the kworker finds that reader > has no room to store data so room <= 0 is met. At this moment, > reader consumes all the data on reader buffer and call > n_tty_kick_worker to check ldata->no_room and finds that there > is no need to call tty_buffer_restart_work to flush data to reader > and reader quits reading. Then kworker sets ldata->no_room=true > and quits too. > > If write buffer is not full, writer will wake kworker to flush data > again after following writes, but if writer buffer is full and writer > goes to sleep, kworker will never be woken again and tty device is > blocked. > > We think this problem can be solved with a check for read buffer > inside function n_tty_receive_buf_common, if read buffer is empty and > ldata->no_room is true, this means that kworker has more data to flush > to read buffer, so a call to n_tty_kick_worker is necessary. > > Signed-off-by: cael <juanfengpy@xxxxxxxxx> > --- > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c > index efc72104c840..36c7bc033c78 100644 > --- a/drivers/tty/n_tty.c > +++ b/drivers/tty/n_tty.c > @@ -1663,6 +1663,9 @@ n_tty_receive_buf_common(struct tty_struct *tty, > const unsigned char *cp, > } else > n_tty_check_throttle(tty); > > + if (!chars_in_buffer(tty)) > + n_tty_kick_worker(tty); > + chars_in_buffer() accesses ldata->read_tail in producer context so this probably just moves the race there? -- i.