[PATCH v2] serial: 8250_dw: Avoid "too much work" from bogus rx timeout interrupt

o.schinagl@xxxxxxxxxxxxx (Olliver Schinagl) · Wed, 29 Mar 2017 11:45:41 +0200

Hey Andy,

On 29-03-17 11:11, Andy Shevchenko wrote:
> On Wed, Mar 29, 2017 at 10:58 AM, Olliver Schinagl <oliver at schinagl.nl> wrote:
>> On 07-02-17 00:30, Douglas Anderson wrote:
>
> First of all I didn't get why people from Cc list are suddenly
> disappeared. Check your mail client settings.
> Returning back some of them.
Appologies, I replied via gmane's news feed to Douglas's initial post as 
I did not have the original post and I failed to check the other 
recipients. My fault. Sorry. I've added the original others as well.

>
>>> It appears that somehow we have a RX Timeout interrupt but there is no
>>> actual data present to receive.  When we're in this state the UART
>>> driver claims that it handled the interrupt but it actually doesn't
>>> really do anything.  This means that we keep getting the interrupt
>>> over and over again.
>
>> I may be running into the same thing on an A20 SoC, but still in the stage
>> of figuring out what is going on, as we get this error very occasionally. Do
>> you have a way to externally induce this behavior other then suspend/resume?
>> As we get it during uart-use and do not have (or I have never tried)
>> suspend/resume on our platform.
>
> On Intel platforms with this IP I can see similar when run loopback
> test on high speeds.
> California may correct me since he did a lot of investigation of the
> issue on x86.
>
>>>  static int dw8250_handle_irq(struct uart_port *p)
>>>  {
>>> +       struct uart_8250_port *up = up_to_u8250p(p);
>>>         struct dw8250_data *d = p->private_data;
>>>         unsigned int iir = p->serial_in(p, UART_IIR);
>>> +       unsigned int status;
>>> +       unsigned long flags;
>>> +
>>> +       /*
>>> +        * There are ways to get Designware-based UARTs into a state where
>>> +        * they are asserting UART_IIR_RX_TIMEOUT but there is no actual
>>> +        * data available.  If we see such a case then we'll do a bogus
>>> +        * read.  If we don't do this then the "RX TIMEOUT" interrupt will
>>> +        * fire forever.
>>
>> I think what you are saying is 'do a bogus read as that is the only way to
>> clear the interrupt, otherwise it will keep firing forever.'?
>
> No, we don't know if this _the only way_. It looks like no one from us
> can tell you a root cause, except may be Synopsys guys.
Has anybody tried to contact synopsis/dw about this issue at all?

true, it is not the only way (maybe only as far as we know for now) but 
it is 'the' way currently.
>
>>> +               spin_lock_irqsave(&p->lock, flags);
>>
>> this is a bit above my knowledge of driver etc, but I don't any spinlocks in
>> the 8250 handle_irq glue drivers, except in the OMAP's case where they are
>> handeling a DMA IRQ. So I ask, because I don't know, why is it needed here?
>
> They serialize IO accessors.
>
> Regarding to the rest comments, the patch is already in upstream, if
> you feel that something should be changed, send an incremental fix.
Ah, I thought I checked, but thought I didn't see it. I'll probably 
forgot to fetch. I'll send a patch for the small mask fix.
>
>> Once I found a way to reproduce the problem (without suspend) I will test
>> this to see if it fixes it for us too.
>
> It would be appreciated, but better to get know the root cause and
> what _hardware_ guys think about solutions.
>
I read over the docs of the IP block (I know a little FPGA programming) 
(dw_apb_uart of 2006) but found nothing yet that would warn for this 
behavior. I suppose hardware/fgpa guys can give more background here 
potentially, but it may also be simply an IP bug?

Olliver