Hello Paul, On Thu, May 12, 2022 at 11:46:21AM -0400, Paul Gortmaker wrote: > [Re: [PATCH] serial: 8250_fsl: Don't report FE, PE and OE twice] On 12/05/2022 (Thu 08:17) Uwe Kleine-König wrote: > > On Wed, May 11, 2022 at 09:29:10PM -0400, Paul Gortmaker wrote: > > > [[PATCH] serial: 8250_fsl: Don't report FE, PE and OE twice] On 11/05/2022 (Wed 11:32) Uwe Kleine-König wrote: > > > > > > > Some Freescale 8250 implementations have the problem that a single long > > > > break results in one irq per character frame time. The code in > > > > fsl8250_handle_irq() that is supposed to handle that uses the BI bit in > > > > lsr_saved_flags to detect such a situation and then skip the second > > > > received character. However it also stores other error bits and so after > > > > a single frame error the character received in the next irq handling is > > > > passed to the upper layer with a frame error, too. > > > > > > > > To weaken this problem restrict saving LSR to only the BI bit. > > > > > > But what is missing is just what "this problem" is - what applications > > > are broken and how? What are the symptoms? This is a 15+ year old > > > platform and so one has to ask why this is just being seen now. > > > > The problem is "However it also stores other error bits and so after a > > single frame error the character received in the next irq handling is > > passed to the upper layer with a frame error, too." which is just the > > sentence before. I hoped this would be understandable :-\ > > I was trying to get you to describe symptoms at a higher level - as I > said above, at the application level - what were you using that wasn't > working that led you down the path to investigate this? Transfering > data wasn't reaching the expected max for baud rate, or serial console > was showing lags and dropped characters, or ...? The situation where the problem was noticed is: The 8313 is supposed to periodically receive a burst of a small (and fixed) number of characters. In the field it sometimes happend that there was a peak on the data line between two such telegrams which the UART interpreted as a character with a parity error. After that the first character of the next telegram wasn't received in userspace, because the driver claimed it was received with another parity error. So effectively a dropped character. > The false positive error bits description is fine, but it isn't > something that a person in the future could read and then say "Oh I'm > having the same problem - I should backport that!" > > > > > Note however that the handling is still broken: > > > > > > > > - lsr_saved_flags is updated using orig_lsr which is the LSR content > > > > for the first received char, but there might be more in the FIFO, so > > > > a character is thrown away that is received later and not necessarily > > > > the one following the break. > > > > - The doubled break might be the 2nd and 3rd char in the FIFO, so the > > > > workaround doesn't catch these, because serial8250_rx_chars() doesn't > > > > handle the workaround. > > > > - lsr_saved_flags might have set UART_LSR_BI at the entry of > > > > fsl8250_handle_irq() which doesn't originate from > > > > fsl8250_handle_irq()'s "up->lsr_saved_flags |= orig_lsr & > > > > UART_LSR_BI;" but from e.g. from serial8250_tx_empty(). > > > > - For a long or a short break this isn't about two characters, but more > > > > or only a single one. > > > > > > I've long since flushed the context required to parse the above, sorry. > > > But the part where it says "is still broken" stands out to me. > > > > The current state is (assuming the errata is accurate and I understood > > it correctly): > > - You get a problem for sure if there is a frame error (because the > > next good char is thrown away). > > - You get a problem for sure if there is a parity error (because the > > next good char is thrown away). > > - You get a problem for sure if there was an overflow (because the > > first good char after the overflow is thrown away). > > - The code is racy for break handling. In some unlikely cases the break > > workaround is applied wrongly. > > > > (Where "thrown away" is really: passed to the tty layer with an error > > indication, which depending on tty settings results in dropping the > > character or passing it on to userspace.) > > > > My patch only fixes the first three issues. A part of the reason for the > > uncomplete fix is that I don't have a platform that requires the workaround. > > (I thought I had, but it doesn't show the described behaviour and > > instead behaves nicely, i.e. one irq per break and no stray bits are > > set.) > > I was hoping that with the full description of the issue from 12+ years > ago that you'd be able to reproduce it on your platform with the WAR disabled. > I take it that you tried and SysRQ still worked fine? I think I did. I have to plan a bit of continous time to reverify. > I also found a copy of an earlier proposed fix from 2010 on patchworks: > http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20100301212324.GA1738@xxxxxxxxxxxxx/ > > Maybe there are some additional details in there of interest? > > I wonder if some other intervening change in that wide time span happens > to mask the issue? Who knows. I'm not sure if you are that interested; > enough to go try an old kernel to find out... > > > That the patch I did is correct is quite obvious for me. Currently the > > fsl8250_handle_irq() function sets the bits BI, OE, FE and PE in > > If I recall correctly, it just clears BI - are you sure it sets bits? Not explicitly, but it does orig_lsr = up->port.serial_in(&up->port, UART_LSR); ... up->lsr_saved_flags = orig_lsr; So whatever error bit is read on function entry is reused for the first char in the next irq run. > > lsr_saved_flags, but only evaluates BI for the workaround. The commit > > that introduced that only talks about BI, the mentioned erratum also > > only mentions BI. > > > > I can try to make the commit log more convincing. Or if the ability to > > test this code on an affected platform is declared a requirement, I will > > I'm not in any position to declare any requirements - just that when you > are bit-bashing to work around some "black box" silicon errata, any > changes might impact whether the WAR is still working or not. > > Your change alters lsr_saved_flags for *every* event, even when no breaks > or workarounds have been in play. I'm not sure what that might trigger. > > > drop the topic, just stop using fsl8250_handle_irq() without feeling sad. > > That might be the best option in the end but I did notice something else > you might want to consider. I believe the fsl8250_handle_irq() was just > a copy of the generic serial8250_handle_irq() as it was in 2011, with > the single block of code inserted for the WAR: > > + /* This is the WAR; if last event was BRK, then read and return */ > + if (unlikely(up->lsr_saved_flags & UART_LSR_BI)) { > + up->lsr_saved_flags &= ~UART_LSR_BI; > + port->serial_in(port, UART_RX); > + spin_unlock_irqrestore(&up->port.lock, flags); > + return 1; > + } > > Of course as we all know - when you copy something, you risk being left > behind when the original gets updated. I just took a look at today's > generic 8250 one -- "git blame drivers/tty/serial/8250/8250_port.c" and > there are changes that probably have left fsl8250_handle_irq() being > left behind. A bit more detective work would be required to see > changes prior to the refactoring in the 2015 commit of b6830f6df891. > > It probably would be worthwhile to return fsl8250_handle_irq() to be the > "equivalent" of serial8250_handle_irq() + WAR as it was originally. It > would be hard to argue against mainlining such changes - they are table > stakes. And who knows, with a bit of luck it might solve your issue too? > Yeah, I already looked into these and part of my plan to fix the workaround was to bring fsl8250_handle_irq() on par with the generic irq handler routine. Effectively there isn't missing much. > Of couse that is more effort than to just stop using the workaround, so I > wouldn't blame you at all if you decided to go that route. Will discuss that with my customer how much effort to put into this. Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | https://www.pengutronix.de/ |
Attachment:
signature.asc
Description: PGP signature