Re: [PATCH] serial: 8250_fsl: Don't report FE, PE and OE twice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Re: [PATCH] serial: 8250_fsl: Don't report FE, PE and OE twice] On 12/05/2022 (Thu 08:17) Uwe Kleine-König wrote:

> Hello Paul,
> 
> first of all thanks for your cooperation on this ancient topic. It's
> very appreciated.

It was quite the flashback - stuff I'd thought was long forgotten!

> And oh, I failed to Cc the NXP people. I added them to Cc:, maybe one of
> them can add something valuable to the discussion.
> 
> On Wed, May 11, 2022 at 09:29:10PM -0400, Paul Gortmaker wrote:
> > [[PATCH] serial: 8250_fsl: Don't report FE, PE and OE twice] On 11/05/2022 (Wed 11:32) Uwe Kleine-König wrote:
> > 
> > > Some Freescale 8250 implementations have the problem that a single long
> > > break results in one irq per character frame time. The code in
> > > fsl8250_handle_irq() that is supposed to handle that uses the BI bit in
> > > lsr_saved_flags to detect such a situation and then skip the second
> > > received character. However it also stores other error bits and so after
> > > a single frame error the character received in the next irq handling is
> > > passed to the upper layer with a frame error, too.
> > > 
> > > To weaken this problem restrict saving LSR to only the BI bit.
> > 
> > But what is missing is just what "this problem" is - what applications
> > are broken and how?  What are the symptoms?  This is a 15+ year old
> > platform and so one has to ask why this is just being seen now.
> 
> The problem is "However it also stores other error bits and so after a
> single frame error the character received in the next irq handling is
> passed to the upper layer with a frame error, too." which is just the
> sentence before. I hoped this would be understandable :-\

I was trying to get you to describe symptoms at a higher level - as I
said above, at the application level - what were you using that wasn't
working that led you down the path to investigate this?   Transfering
data wasn't reaching the expected max for baud rate, or serial console
was showing lags and dropped characters, or ...?

The false positive error bits description is fine, but it isn't
something that a person in the future could read and then say "Oh I'm
having the same problem - I should backport that!"

> > > Note however that the handling is still broken:
> > > 
> > >  - lsr_saved_flags is updated using orig_lsr which is the LSR content
> > >    for the first received char, but there might be more in the FIFO, so
> > >    a character is thrown away that is received later and not necessarily
> > >    the one following the break.
> > >  - The doubled break might be the 2nd and 3rd char in the FIFO, so the
> > >    workaround doesn't catch these, because serial8250_rx_chars() doesn't
> > >    handle the workaround.
> > >  - lsr_saved_flags might have set UART_LSR_BI at the entry of
> > >    fsl8250_handle_irq() which doesn't originate from
> > >    fsl8250_handle_irq()'s "up->lsr_saved_flags |= orig_lsr &
> > >    UART_LSR_BI;" but from e.g. from serial8250_tx_empty().
> > >  - For a long or a short break this isn't about two characters, but more
> > >    or only a single one.
> > 
> > I've long since flushed the context required to parse the above, sorry.
> > But the part where it says "is still broken" stands out to me.
> 
> The current state is (assuming the errata is accurate and I understood
> it correctly): 
>  - You get a problem for sure if there is a frame error (because the
>    next good char is thrown away).
>  - You get a problem for sure if there is a parity error (because the
>    next good char is thrown away).
>  - You get a problem for sure if there was an overflow (because the
>    first good char after the overflow is thrown away).
>  - The code is racy for break handling. In some unlikely cases the break
>    workaround is applied wrongly.
> 
> (Where "thrown away" is really: passed to the tty layer with an error
> indication, which depending on tty settings results in dropping the
> character or passing it on to userspace.)
> 
> My patch only fixes the first three issues. A part of the reason for the
> uncomplete fix is that I don't have a platform that requires the workaround.
> (I thought I had, but it doesn't show the described behaviour and
> instead behaves nicely, i.e. one irq per break and no stray bits are
> set.)

I was hoping that with the full description of the issue from 12+ years
ago that you'd be able to reproduce it on your platform with the WAR disabled.
I take it that you tried and SysRQ still worked fine?

I also found a copy of an earlier proposed fix from 2010 on patchworks:
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20100301212324.GA1738@xxxxxxxxxxxxx/

Maybe there are some additional details in there of interest?

I wonder if some other intervening change in that wide time span happens
to mask the issue?  Who knows.  I'm not sure if you are that interested;
enough to go try an old kernel to find out...

> That the patch I did is correct is quite obvious for me. Currently the
> fsl8250_handle_irq() function sets the bits BI, OE, FE and PE in

If I recall correctly, it just clears BI - are you sure it sets bits?

> lsr_saved_flags, but only evaluates BI for the workaround. The commit
> that introduced that only talks about BI, the mentioned erratum also
> only mentions BI.
> 
> I can try to make the commit log more convincing. Or if the ability to
> test this code on an affected platform is declared a requirement, I will

I'm not in any position to declare any requirements - just that when you
are bit-bashing to work around some "black box" silicon errata, any
changes might impact whether the WAR is still working or not.

Your change alters lsr_saved_flags for *every* event, even when no breaks
or workarounds have been in play.  I'm not sure what that might trigger.

> drop the topic, just stop using fsl8250_handle_irq() without feeling sad.

That might be the best option in the end but I did notice something else
you might want to consider.  I believe the fsl8250_handle_irq() was just
a copy of the generic serial8250_handle_irq() as it was in 2011, with
the single block of code inserted for the WAR:

+       /* This is the WAR; if last event was BRK, then read and return */
+       if (unlikely(up->lsr_saved_flags & UART_LSR_BI)) {
+               up->lsr_saved_flags &= ~UART_LSR_BI;
+               port->serial_in(port, UART_RX);
+               spin_unlock_irqrestore(&up->port.lock, flags);
+               return 1;
+       }

Of course as we all know - when you copy something, you risk being left
behind when the original gets updated.  I just took a look at today's
generic 8250 one -- "git blame drivers/tty/serial/8250/8250_port.c" and
there are changes that probably have left fsl8250_handle_irq() being
left behind.  A bit more detective work would be required to see
changes prior to the refactoring in the 2015 commit of b6830f6df891.

It probably would be worthwhile to return fsl8250_handle_irq() to be the
"equivalent" of serial8250_handle_irq() + WAR as it was originally.  It
would be hard to argue against mainlining such changes - they are table
stakes.  And who knows, with a bit of luck it might solve your issue too?

Of couse that is more effort than to just stop using the workaround, so I
wouldn't blame you at all if you decided to go that route.

Paul.
--

> 
> Best regards Uwe
> 
> -- Pengutronix e.K.                           | Uwe Kleine-König
> | Industrial Linux Solutions                 |
> https://www.pengutronix.de/ |






[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux PPP]     [Linux FS]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Linmodem]     [Device Mapper]     [Linux Kernel for ARM]

  Powered by Linux