Hi Peter..sorry I wanted to try with higher baud rates before I got back. I do see the overrun even at higher rates like 38400 or 57600. While a printk of about 60 chars is being spit out, I pasted a rather large command(s) of about 200-1000 chars on the console and I see overrun. Pl see my answers within: On Wed, Feb 18, 2015 at 6:32 AM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote: > On 02/18/2015 03:40 AM, Prasad Koya wrote: >> On Wed, Feb 18, 2015 at 12:35 AM, Prasad Koya <prasad@xxxxxxxxxx> wrote: >>> On Tue, Feb 17, 2015 at 7:34 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> >>> wrote: >>>> On 02/17/2015 06:22 PM, prasad@xxxxxxxxxx wrote: >>>>> From: Prasad Koya <prasad@xxxxxxxxxx> >>>>> >>>>> At slow baud rates, say 9600, it is easy to run into tty buffer overrun >>>>> as interrupts could get disabled for a long time. To avoid that, if >>>>> UART_LSR_DR is set, pick up bytes from rx fifo while waiting for tx >>>>> buffer to be empty. Stop buffering if port->sysrq is true or >>>>> oops_in_progress is set. >>>> >>>> Some of the original information you supplied has been lost in this commit >>>> log. I think it would be helpful if you detailed how the overrun error can >>>> happen, like you did before: >>>> >> >> Yes, its been quite some my first patch or problem was posted..then holiday >> season etc. >> >>> >>>> >>>>> On 09/30/2014 01:30 PM, Prasad Koya wrote: >>>>>> At 9600 baud, we are seeing buffer overruns on our 16550A UARTs. If a >>>>>> command, say 40 chars long, is sent to console and if printk to >>>>>> console happens at same time, we running into receive buffer overrun. >>>>>> Thats because, at 9600 baud, it takes about a millisecond to xmit 1 >>>>>> char and with receive FIFO only 16 chars deep, it is very easy to >>>>>> cause receive buffer overrun. >>>> >>>> What exactly is the use case here? I ask because >>> >>> >> >> In our testing infrastructure we have scripts shooting commands to the >> server (ethernet switch) via its console. Bunch of such servers are >> connected to a terminal server. These could be commands like "ip route add >> to 192.168.0.0/16 dev ma1 via 192.168.0.1" which are over 16 chars in >> length. If server happens to write something out to console in the middle of >> receiving above command, we are seeing receive fifo overrun. > > Thanks for explaining that. > > I just want to be honest right here: I'm reluctant to recommend such an > invasive and consequential change for an automated test harness that > can be worked around at the sender end. > > I'm not implying yours is not a valid use-case; it is. But there are a > number of factors which weigh against it: > 1. It makes serial console output less reliable. This is already a problem > with printk(). > 2. It imposes a significant maintenance burden by linking areas of code > that were separate before. > 3. It's had really limited testing. > 4. There are other workarounds available. I wanted to get feedback on the patch starting from whether this is a valid use case. There are ways to circumvent this overrun but I felt like kernel should be able to handle that (I may be wrong). If the reviewers here agree that this is a valid use-case I'm ok doing required testing/look into any scripts to test this. > > At a minimum this patch should have a CONFIG_ knob which is by default disabled. > Sure. This can be disabled by default. > FWIW, I carry a number of patchsets outside mainline because they're > simply not suitable, either because they're really only useful to me > or because they're too ugly or whatever. > > >>>> 1. the patch is trying to fix overrun during a random printk but >>>> makes no effort to fix overrun during the potentially much, much >>>> longer sysrq printk. >>> >>> >> >> The sysrq-key combinations are manual entry and we'd never see this in that >> case. > > That's what I thought; I just wanted to be sure. > > >>>> 2. I'm struggling to envision the circumstances where a printk is >>>> happening frequently enough to collide with large tty input. Heaviest >>>> printk usage is during boot but that is also typically the lightest >>>> tty input load. >>>> >>> >> >> When we first ran into this issue it was soon after boot where script was >> configuring the server and one of our modules printed something on console. >> There could be other cases where one could run into such situation. > > Ok, this makes sense. The console goes online way earlier than the network > stack. > > >>>> Do you have a test case that I can use to reproduce this? >>>> >>>> Why is the line speed so slow? >>>> >>> >> >> Maybe this low speed requirement was something that came from our customers. > > So it is an actual requirement and not simply because the default console > speed is 9600? > Apparently is the 9600 is the industry standard baud rate for switches/routers, so test our switch from console at same speed. > If possible, I'd rather fix this problem by setting the console line speed > on the kernel command line from your bootloader. > > [...] > >>>>> +/* >>>>> * Wait for transmitter & holding register to empty >>>>> + * If rx is true, then also poll for received characters if it's >>>>> + * safe to process them. This is only used in the SUPPORT_SYSRQ >>>>> + * case, and hopefully the compiler can figure out it's dead code >>>>> + * if that's not the case. >>>>> */ >>>>> -static void wait_for_xmitr(struct uart_8250_port *up, int bits) >>>>> +static void __wait_for_xmitr(struct uart_8250_port *up, int bits, bool >>>>> rx) >>>>> { >>>>> unsigned int status, tmout = 10000; >>>>> >>>>> @@ -1978,6 +2005,11 @@ static void wait_for_xmitr(struct uart_8250_port >>>>> *up, int bits) >>>>> break; >>>>> if (--tmout == 0) >>>>> break; >>>>> + >>>>> + if (rx && status & (UART_LSR_DR | UART_LSR_BI) && >>>>> + !emergency_situation(&up->port)) >>>>> + serial8250_rx_chars(up, status, true); >>>>> + >>>> >>>> serial8250_rx_chars() should not be used in this context because it loops >>>> as long as there is data to be read. That is unacceptably long with >>>> large fifos and fast line speeds. >>>> >>>> serial8250_rx_chars() will have to be refactored so that just a single >>>> char >>>> can be read from here. And what I said about not trying to handle anything >>>> other than data ready. >>>> >>> >> >> What I thought after your first comments was that, since it seems to take >> same amount of time for serial_in() compared to udelay(1), why not read one >> char from rx fifo instead of waiting with udelay(). > > I did think about that too, but I'm not sure it matters much. > The line status is being heavily oversampled as it is. > >>> >>>> >>>>> udelay(1); >>>>> } >>>>> >>>>> @@ -1989,12 +2021,29 @@ static void wait_for_xmitr(struct uart_8250_port >>>>> *up, int bits) >>>>> up->msr_saved_flags |= msr & MSR_SAVE_FLAGS; >>>>> if (msr & UART_MSR_CTS) >>>>> break; >>>>> + >>>>> + if (rx && !emergency_situation(&up->port)) { >>>>> + status = serial_in(up, UART_LSR); >>>>> + >>>>> + up->lsr_saved_flags |= status & >>>>> LSR_SAVE_FLAGS; >>>>> + if (status & (UART_LSR_DR | UART_LSR_BI)) >>>>> + serial8250_rx_chars(up, status, >>>>> true); >>>>> + } >>>>> + >>>> >>>> Is UPF_CONS_FLOW part of the use case? If not, please drop. >>> >> >> No. not related to flow control. > > Ok. > > [...] > >>>>> + >>>>> #ifdef CONFIG_CONSOLE_POLL >>>>> /* >>>>> * Console polling routines for writing and reading from the uart while >>>>> @@ -3168,7 +3217,7 @@ static void serial8250_console_putchar(struct >>>>> uart_port *port, int ch) >>>>> { >>>>> struct uart_8250_port *up = up_to_u8250p(port); >>>>> >>>>> - wait_for_xmitr(up, UART_LSR_THRE); >>>>> + __wait_for_xmitr(up, UART_LSR_THRE, true); >>>> >>>> Why is rx performed unconditionally if the problem set is limited >>>> to slow line speeds (I'm not convinced it is limited to slow >>>> line speeds though)? >>>> >>> >> >> not sure if i understood your question. we are doing tx and rx is done >> only if lsr has DR and BI >> bits set. not unconditional. the issue may be seen even at higher speeds. > > I meant unconditionally with respect to the observed problem: > the commit log indicates that the problem is limited to slow line speeds > on uarts with small fifos, but yet the patch applies to all uarts at > all line speeds. > Got it. > If the problem can happen at any line speed then the commit log shouldn't > say it only happens at slow line speeds. > > Sure. The commit log could be changed to make it more generic. >>>> >>>> What other port types and line speeds has this been tested on besides >>>> your 16550A design? >>>> >>> >> >> I can try at higher speeds as well and see if the issue gets reproduced. I >> don't have other port types. From what i read online, 16550s are quite >> popular commercially but maybe people use it at higher speeds. > > I have many and they do not all behave similarly, that's why I'm asking > which platforms and conditions this has been tested under. > > UART is built into our motherboard on switches and we've seen this behavior on kernels 2.6.x and up to 3.19. thanks. >>>> >>>>> serial_port_out(port, UART_TX, ch); >>>>> } >>>>> >>>>> @@ -3176,6 +3225,10 @@ static void serial8250_console_putchar(struct >>>>> uart_port *port, int ch) >>>>> * Print a string to the serial port trying not to disturb >>>>> * any possible real use of the port... >>>>> * >>>>> + * In order to avoid dropped input if the output is longer than >>>>> + * the available hardware FIFO space, we try to handle incoming >>>>> + * characters during this polling loop. >>>>> + * >>>>> * The console_lock must be held when we get here. >>>>> */ >>>>> static void >>>>> @@ -3214,7 +3267,7 @@ serial8250_console_write(struct console *co, const >>>>> char *s, unsigned int count) >>>>> * Finally, wait for transmitter to become empty >>>>> * and restore the IER >>>>> */ >>>>> - wait_for_xmitr(up, BOTH_EMPTY); >>>>> + __wait_for_xmitr(up, BOTH_EMPTY, true); >>>> >>>> Is rx actually necessary here? How big is the tx shifter? Because the tx >>>> fifo >>>> can have at most 1 char right now and any pending rx was done just before >>>> that. >>>> >> >> Yes, rx here is similar to the rx in serial8250_console_putchar(). > > I think you may misunderstand what I mean here. > > I don't think rx could overflow while waiting for the transmitter to empty > here: > > 1. the last output char has just been written to the tx fifo by > serial8250_console_char() > 2. this means the tx fifo can have at most 1 char in it right now > 3. the rx fifo was just read at most 1us before the tx fifo was written > 4. so this call to wait_for_xmitr() is only waiting for 1 char to be output > plus how ever much data the shifter can hold, which probably isn't much > > > How could the rx fifo overflow in the time it takes the xmitr to send > 1-2 chars, especially since the rx fifo was being polled at 1000x oversample > (every 1us for each 1ms tx char)? > > Regards, > Peter Hurley > -- > To unsubscribe from this list: send the line "unsubscribe linux-serial" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-serial" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html