On Tue, 24 May 2022, cael wrote: > if ldata->no_room is not true, that means kworker has flushed > at least n characters to break the while loop, so return value of > n_tty_receive_buf_common is not zero, flush_to_ldisc will > continue to call this function to flush data to reader if write buffer > is not empty. Now you switched to an entirely different case, not the one we were talking about. ...There is no ldisc->no_room = true race in the case you now described. -- i. > Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> 于2022年5月24日周二 19:40写道: > > > > On Tue, 24 May 2022, cael wrote: > > > > > Thanks for the answer, yes, there exists a race between reader and kworker, > > > but it's OK. Before checking chars_in_buffer in kworker, > > > ldata->no_room is set true, > > > > Nothing seems to guarantee this. > > > > > if reader changes ldata->read_tail in n_tty_read when kworker checks this value > > > which makes the check fail, then when reader reaches end of n_tty_read, > > > n_tty_kick_worker will also be called. Besides, kworker and reader may > > > call n_tty_kick_worker at the same time, this function only queues work > > > on workqueue, so it's harmless. > > > > I'm not worried about the case where both cpus call n_tty_kick_worker but > > the case where producer cpu sees chars_in_buffer() > 0 and consumer cpu > > !no_room. > > > > -- > > i. > > > > > Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> 于2022年5月24日周二 17:11写道: > > > > > > > > On Tue, 24 May 2022, cael wrote: > > > > > > > > > We have met a hang on pty device, the reader was blocking at > > > > > epoll on master side, the writer was sleeping at wait_woken inside > > > > > n_tty_write on slave side ,and the write buffer on tty_port was full, we > > > > > > > > Space after comma. It would be also useful to tone down usage of "we" in > > > > the changelog. > > > > > > > > > found that the reader and writer would never be woken again and block > > > > > forever. > > > > > > > > > > We thought the problem was caused as a race between reader and > > > > > kworker as follows: > > > > > n_tty_read(reader)| n_tty_receive_buf_common(kworker) > > > > > |room = N_TTY_BUF_SIZE - (ldata->read_head - tail) > > > > > |room <= 0 > > > > > copy_from_read_buf| > > > > > n_tty_kick_worker | > > > > > |ldata->no_room = true > > > > > > > > > > After writing to slave device, writer wakes up kworker to flush > > > > > data on tty_port to reader, and the kworker finds that reader > > > > > has no room to store data so room <= 0 is met. At this moment, > > > > > reader consumes all the data on reader buffer and call > > > > > n_tty_kick_worker to check ldata->no_room and finds that there > > > > > is no need to call tty_buffer_restart_work to flush data to reader > > > > > and reader quits reading. Then kworker sets ldata->no_room=true > > > > > and quits too. > > > > > > > > > > If write buffer is not full, writer will wake kworker to flush data > > > > > again after following writes, but if writer buffer is full and writer > > > > > goes to sleep, kworker will never be woken again and tty device is > > > > > blocked. > > > > > > > > > > We think this problem can be solved with a check for read buffer > > > > > inside function n_tty_receive_buf_common, if read buffer is empty and > > > > > ldata->no_room is true, this means that kworker has more data to flush > > > > > to read buffer, so a call to n_tty_kick_worker is necessary. > > > > > > > > > > Signed-off-by: cael <juanfengpy@xxxxxxxxx> > > > > > --- > > > > > diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c > > > > > index efc72104c840..36c7bc033c78 100644 > > > > > --- a/drivers/tty/n_tty.c > > > > > +++ b/drivers/tty/n_tty.c > > > > > @@ -1663,6 +1663,9 @@ n_tty_receive_buf_common(struct tty_struct *tty, > > > > > const unsigned char *cp, > > > > > } else > > > > > n_tty_check_throttle(tty); > > > > > > > > > > + if (!chars_in_buffer(tty)) > > > > > + n_tty_kick_worker(tty); > > > > > + > > > > > > > > chars_in_buffer() accesses ldata->read_tail in producer context so this > > > > probably just moves the race there?