Re: [PATCH 1/1] tty, serial, 8250: to avoid recv fifo overrun, read rx fifo during xmit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Peter..sorry I wanted to try with higher baud rates before I got
back. I do see the overrun even at higher rates like 38400 or 57600.
While a printk of about 60 chars is being spit out, I pasted a rather
large command(s) of about 200-1000 chars on the console and I see
overrun. Pl see my answers within:

On Wed, Feb 18, 2015 at 6:32 AM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
> On 02/18/2015 03:40 AM, Prasad Koya wrote:
>> On Wed, Feb 18, 2015 at 12:35 AM, Prasad Koya <prasad@xxxxxxxxxx> wrote:
>>> On Tue, Feb 17, 2015 at 7:34 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx>
>>> wrote:
>>>> On 02/17/2015 06:22 PM, prasad@xxxxxxxxxx wrote:
>>>>> From: Prasad Koya <prasad@xxxxxxxxxx>
>>>>>
>>>>> At slow baud rates, say 9600, it is easy to run into tty buffer overrun
>>>>> as interrupts could get disabled for a long time. To avoid that, if
>>>>> UART_LSR_DR is set, pick up bytes from rx fifo while waiting for tx
>>>>> buffer to be empty. Stop buffering if port->sysrq is true or
>>>>> oops_in_progress is set.
>>>>
>>>> Some of the original information you supplied has been lost in this commit
>>>> log. I think it would be helpful if you detailed how the overrun error can
>>>> happen, like you did before:
>>>>
>>
>> Yes, its been quite some my first patch or problem was posted..then holiday
>> season etc.
>>
>>>
>>>>
>>>>> On 09/30/2014 01:30 PM, Prasad Koya wrote:
>>>>>> At 9600 baud, we are seeing buffer overruns on our 16550A UARTs. If a
>>>>>> command, say 40 chars long, is sent to console and if printk to
>>>>>> console happens at same time, we running into receive buffer overrun.
>>>>>> Thats because, at 9600 baud, it takes about a millisecond to xmit 1
>>>>>> char and with receive FIFO only 16 chars deep, it is very easy to
>>>>>> cause receive buffer overrun.
>>>>
>>>> What exactly is the use case here? I ask because
>>>
>>>
>>
>> In our testing infrastructure we have scripts shooting commands to the
>> server (ethernet switch) via its console. Bunch of such servers are
>> connected to a terminal server. These could be commands like "ip route add
>> to 192.168.0.0/16 dev ma1 via 192.168.0.1" which are over 16 chars in
>> length. If server happens to write something out to console in the middle of
>> receiving above command, we are seeing receive fifo overrun.
>
> Thanks for explaining that.
>
> I just want to be honest right here: I'm reluctant to recommend such an
> invasive and consequential change for an automated test harness that
> can be worked around at the sender end.
>
> I'm not implying yours is not a valid use-case; it is. But there are a
> number of factors which weigh against it:
> 1. It makes serial console output less reliable. This is already a problem
>    with printk().
> 2. It imposes a significant maintenance burden by linking areas of code
>    that were separate before.
> 3. It's had really limited testing.
> 4. There are other workarounds available.

I wanted to get feedback on the patch starting from whether this is a
valid use case. There are ways to circumvent this overrun but I felt
like kernel should be able to handle that (I may be wrong). If the
reviewers here agree that this is a valid use-case I'm ok doing
required testing/look into any scripts to test this.

>
> At a minimum this patch should have a CONFIG_ knob which is by default disabled.
>

Sure. This can be disabled by default.

> FWIW, I carry a number of patchsets outside mainline because they're
> simply not suitable, either because they're really only useful to me
> or because they're too ugly or whatever.
>
>
>>>> 1. the patch is trying to fix overrun during a random printk but
>>>>    makes no effort to fix overrun during the potentially much, much
>>>>    longer sysrq printk.
>>>
>>>
>>
>> The sysrq-key combinations are manual entry and we'd never see this in that
>> case.
>
> That's what I thought; I just wanted to be sure.
>
>
>>>> 2. I'm struggling to envision the circumstances where a printk is
>>>>    happening frequently enough to collide with large tty input. Heaviest
>>>>    printk usage is during boot but that is also typically the lightest
>>>>    tty input load.
>>>>
>>>
>>
>> When we first ran into this issue it was soon after boot where script was
>> configuring the server and one of our modules printed something on console.
>> There could be other cases where one could run into such situation.
>
> Ok, this makes sense. The console goes online way earlier than the network
> stack.
>
>
>>>> Do you have a test case that I can use to reproduce this?
>>>>
>>>> Why is the line speed so slow?
>>>>
>>>
>>
>> Maybe this low speed requirement was something that came from our customers.
>
> So it is an actual requirement and not simply because the default console
> speed is 9600?
>

Apparently is the 9600 is the industry standard baud rate for
switches/routers, so test our switch from console at same speed.

> If possible, I'd rather fix this problem by setting the console line speed
> on the kernel command line from your bootloader.
>
> [...]
>
>>>>> +/*
>>>>>   *   Wait for transmitter & holding register to empty
>>>>> + *   If rx is true, then also poll for received characters if it's
>>>>> + *   safe to process them.  This is only used in the SUPPORT_SYSRQ
>>>>> + *   case, and hopefully the compiler can figure out it's dead code
>>>>> + *   if that's not the case.
>>>>>   */
>>>>> -static void wait_for_xmitr(struct uart_8250_port *up, int bits)
>>>>> +static void __wait_for_xmitr(struct uart_8250_port *up, int bits, bool
>>>>> rx)
>>>>>  {
>>>>>       unsigned int status, tmout = 10000;
>>>>>
>>>>> @@ -1978,6 +2005,11 @@ static void wait_for_xmitr(struct uart_8250_port
>>>>> *up, int bits)
>>>>>                       break;
>>>>>               if (--tmout == 0)
>>>>>                       break;
>>>>> +
>>>>> +             if (rx && status & (UART_LSR_DR | UART_LSR_BI) &&
>>>>> +                 !emergency_situation(&up->port))
>>>>> +                     serial8250_rx_chars(up, status, true);
>>>>> +
>>>>
>>>> serial8250_rx_chars() should not be used in this context because it loops
>>>> as long as there is data to be read. That is unacceptably long with
>>>> large fifos and fast line speeds.
>>>>
>>>> serial8250_rx_chars() will have to be refactored so that just a single
>>>> char
>>>> can be read from here. And what I said about not trying to handle anything
>>>> other than data ready.
>>>>
>>>
>>
>> What I thought after your first comments was that, since it seems to take
>> same amount of time for serial_in() compared to udelay(1), why not read one
>> char from rx fifo instead of waiting with udelay().
>
> I did think about that too, but I'm not sure it matters much.
> The line status is being heavily oversampled as it is.
>
>>>
>>>>
>>>>>               udelay(1);
>>>>>       }
>>>>>
>>>>> @@ -1989,12 +2021,29 @@ static void wait_for_xmitr(struct uart_8250_port
>>>>> *up, int bits)
>>>>>                       up->msr_saved_flags |= msr & MSR_SAVE_FLAGS;
>>>>>                       if (msr & UART_MSR_CTS)
>>>>>                               break;
>>>>> +
>>>>> +                     if (rx && !emergency_situation(&up->port)) {
>>>>> +                             status = serial_in(up, UART_LSR);
>>>>> +
>>>>> +                             up->lsr_saved_flags |= status &
>>>>> LSR_SAVE_FLAGS;
>>>>> +                             if (status & (UART_LSR_DR | UART_LSR_BI))
>>>>> +                                     serial8250_rx_chars(up, status,
>>>>> true);
>>>>> +                     }
>>>>> +
>>>>
>>>> Is UPF_CONS_FLOW part of the use case? If not, please drop.
>>>
>>
>> No. not related to flow control.
>
> Ok.
>
> [...]
>
>>>>> +
>>>>>  #ifdef CONFIG_CONSOLE_POLL
>>>>>  /*
>>>>>   * Console polling routines for writing and reading from the uart while
>>>>> @@ -3168,7 +3217,7 @@ static void serial8250_console_putchar(struct
>>>>> uart_port *port, int ch)
>>>>>  {
>>>>>       struct uart_8250_port *up = up_to_u8250p(port);
>>>>>
>>>>> -     wait_for_xmitr(up, UART_LSR_THRE);
>>>>> +     __wait_for_xmitr(up, UART_LSR_THRE, true);
>>>>
>>>> Why is rx performed unconditionally if the problem set is limited
>>>> to slow line speeds (I'm not convinced it is limited to slow
>>>> line speeds though)?
>>>>
>>>
>>
>> not sure if i understood your question. we are doing tx and rx is done
>> only if lsr has DR and BI
>> bits set. not unconditional. the issue may be seen even at higher speeds.
>
> I meant unconditionally with respect to the observed problem:
> the commit log indicates that the problem is limited to slow line speeds
> on uarts with small fifos, but yet the patch applies to all uarts at
> all line speeds.
>

Got it.

> If the problem can happen at any line speed then the commit log shouldn't
> say it only happens at slow line speeds.
>
>

Sure. The commit log could be changed to make it more generic.

>>>>
>>>> What other port types and line speeds has this been tested on besides
>>>> your 16550A design?
>>>>
>>>
>>
>> I can try at higher speeds as well and see if the issue gets reproduced. I
>> don't have other port types. From what i read online, 16550s are quite
>> popular commercially but maybe people use it at higher speeds.
>
> I have many and they do not all behave similarly, that's why I'm asking
> which platforms and conditions this has been tested under.
>
>

UART is built into our motherboard on switches and we've seen this
behavior on kernels 2.6.x and up to 3.19.

thanks.

>>>>
>>>>>       serial_port_out(port, UART_TX, ch);
>>>>>  }
>>>>>
>>>>> @@ -3176,6 +3225,10 @@ static void serial8250_console_putchar(struct
>>>>> uart_port *port, int ch)
>>>>>   *   Print a string to the serial port trying not to disturb
>>>>>   *   any possible real use of the port...
>>>>>   *
>>>>> + *   In order to avoid dropped input if the output is longer than
>>>>> + *   the available hardware FIFO space, we try to handle incoming
>>>>> + *   characters during this polling loop.
>>>>> + *
>>>>>   *   The console_lock must be held when we get here.
>>>>>   */
>>>>>  static void
>>>>> @@ -3214,7 +3267,7 @@ serial8250_console_write(struct console *co, const
>>>>> char *s, unsigned int count)
>>>>>        *      Finally, wait for transmitter to become empty
>>>>>        *      and restore the IER
>>>>>        */
>>>>> -     wait_for_xmitr(up, BOTH_EMPTY);
>>>>> +     __wait_for_xmitr(up, BOTH_EMPTY, true);
>>>>
>>>> Is rx actually necessary here? How big is the tx shifter? Because the tx
>>>> fifo
>>>> can have at most 1 char right now and any pending rx was done just before
>>>> that.
>>>>
>>
>> Yes, rx here is similar to the rx in serial8250_console_putchar().
>
> I think you may misunderstand what I mean here.
>
> I don't think rx could overflow while waiting for the transmitter to empty
> here:
>
> 1. the last output char has just been written to the tx fifo by
>    serial8250_console_char()
> 2. this means the tx fifo can have at most 1 char in it right now
> 3. the rx fifo was just read at most 1us before the tx fifo was written
> 4. so this call to wait_for_xmitr() is only waiting for 1 char to be output
>    plus how ever much data the shifter can hold, which probably isn't much
>
>
> How could the rx fifo overflow in the time it takes the xmitr to send
> 1-2 chars, especially since the rx fifo was being polled at 1000x oversample
> (every 1us for each 1ms tx char)?
>
> Regards,
> Peter Hurley
> --
> To unsubscribe from this list: send the line "unsubscribe linux-serial" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-serial" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux PPP]     [Linux FS]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Linmodem]     [Device Mapper]     [Linux Kernel for ARM]

  Powered by Linux