[PATCH/RFC] RE: 8250 misses interrupt, stalls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At Stratus we have also seen missed interrupts in 8250.c.  We
concluded these are due to serial driver UART_BUG_TXEN false positive
on SMP system as explained below with patches to fix this problem.

Symptom and Environment:

On an SMP x86_64 system that has 8 CPU cores, we see serial output to
16550A UARTs stalling.  When this situation occurs, the associated
tty_struct does not have stopped flags set and the uart_info->xmit
buffer is not empty.  If interrupts were occurring the data should be
sent to the UART.  Because the data is not being sent, it seems a
"transmitter holding register empty" interrupt (THRI) is getting lost
and therefore outgoing data stops.

We only see this bug on SMP systems.  When we boot a multi-core system
with maxcpus=1 the problem does not occur.  In these tests, a serial
console is on ttyS0 which uses IRQ4.  ttyS1 is used by pppd and it
uses IRQ3.  Neither IRQ is shared.  We can test by rebooting the
system and after the init scripts have run, looking at whether the
UART_BUG_TXEN bit is set for each tty.  With maxcpus=1 UART_BUG_TXEN
never gets set.  Without maxcpus on the kernel command line, we see
false positives in setting UART_BUG_TXEN perhaps 75 to 95 percent of
the time on ttyS1, and about 20 percent of the time on ttyS0.

The particular UARTs used in the Stratus system are part of the
PC87427 Server IO chip.  This chip was designed by National
Semiconductor and documentation on the NS web site says the chip is
compatible with 16450 and 16550A.  Now that business has transferred
to Winbond from NS; so those parts come from Winbond.  This is a
modern UART implementation that should not have the UART_BUG_TXEN.

We saw this problems in Red Hat Enterprise Linux 5.2, with kernel
linux-2.6.18-92.el5.  However from source code examination, it seems
this would also occur in linux 2.6.24 and later.

Analysis:
(line numbers from linux v2.6.24.4)

How UART_BUG_TXEN gets set due to a false positive on SMP systems ---
The UART is initialized by function serial8250_startup() in 8250.c.
At line 1860 the call to serial_link_irq_chain(up) connects the IRQ to
the ISR in this driver.  It is relevant that the ISR reads the IIR
before it tries to acquire the
up->port.lock spinlock and reading the IIR would clear THRI if it is
the interrupt cause thus breaking this detection logic that comes a
few lines later in serial8250_startup().  Line 1881 is the last step
necessary for the ISR to be
entered.
in serial8250_startup:
1860                retval = serial_link_irq_chain(up);
...
1874        } else
1875                /*
1876                 * Most PC uarts need OUT2 raised to enable interrupts.
1877                 */
1878                if (is_real_interrupt(up->port.irq))
1879                        up->port.mctrl |= TIOCM_OUT2;
1880
1881        serial8250_set_mctrl(&up->port, up->port.mctrl);
1882
1883        /*
1884         * Do a quick test to see if we receive an
1885         * interrupt when we enable the TX irq.
1886         */
1887        serial_outp(up, UART_IER, UART_IER_THRI);
1888        lsr = serial_in(up, UART_LSR);
1889        iir = serial_in(up, UART_IIR);
1890        serial_outp(up, UART_IER, 0);
1891
1892        if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT) {
1893                if (!(up->bugs & UART_BUG_TXEN)) {
1894                        up->bugs |= UART_BUG_TXEN;
1895                        pr_debug("ttyS%d - enabling bad tx status
workarounds\n",
1896                                 port->line);
1897                }
1898        } else {
1899                up->bugs &= ~UART_BUG_TXEN;
1900        }
1901
1902        spin_unlock_irqrestore(&up->port.lock, flags);

Line 1887 causes an interrupt and the problem can occur when the ISR
is entered on another processor.  If ISR reads the IIR, clearing THRI,
before the IIR is read on line 1889, a false positive is detected.


How incorrectly detecting UART_BUG_TXEN causes output to stall ---

When usermode has more characters for the UART to transmit, the
characters are placed into the uart_info->xmit circular buffer and
serial8250_start_tx() gets called.  That function flows through the
following code path:

in serial8250_start_tx:
1241        struct uart_8250_port *up = (struct uart_8250_port *)port;
1242
1243        if (!(up->ier & UART_IER_THRI)) {
1244                up->ier |= UART_IER_THRI;
1245                serial_out(up, UART_IER, up->ier);
1246
1247                if (up->bugs & UART_BUG_TXEN) {
1248                        unsigned char lsr, iir;
1249                        lsr = serial_in(up, UART_LSR);
1250                        up->lsr_saved_flags |= lsr & LSR_SAVE_FLAGS;
1251                        iir = serial_in(up, UART_IIR) & 0x0f;
1252                        if ((up->port.type == PORT_RM9000) ?
1253                                (lsr & UART_LSR_THRE &&
1254                                (iir == UART_IIR_NO_INT || iir ==
UART_IIR_THRI)) :
1255                                (lsr & UART_LSR_TEMT && iir &
UART_IIR_NO_INT))
1256                                transmit_chars(up);
1257                }
1258        }

On NON-buggy UARTs line 1245 causes a THRI interrupt request.  If the
IIR has not been read by the ISR by the time it is read on line 1251,
the value read by line 1251 can indicate that THRI is pending; in this
case, reading the IIR would clear the THRI status, causing the
interrupt to get lost.  This value read from the IIR would not be
UART_IIR_NO_INT so that line 1256 would be bypassed; with no
characters sent to the transmitter in the UART another THRI interrupt
request would not occur.  Subsequent calls to this routine do nothing
because the  UART_IER_THRI bit is already set.  This causes output
stalls.


Proposed fixes ---

Here are two patches that are alternatives for fixing this problem:


The first patch was tested at Red Hat and Stratus.  It has been
released to users of Red Hat Enterprise Linux 5.2.
This patch takes the port's irq lock when starting up the UART to
provide mutual exclusion between the "quick test" in the startup code
and the interrupt service routine.

--- linux-2.6.18-92-old/drivers/serial/8250.c   2008-08-13
14:07:08.000000000 +0000
+++ linux-2.6.18-92-new/drivers/serial/8250.c   2008-08-13
14:04:12.000000000 +0000
@@ -1749,65 +1749,73 @@

               timeout = timeout > 6 ? (timeout / 2 - 2) : 1;

               up->timer.data = (unsigned long)up;
               mod_timer(&up->timer, jiffies + timeout);
       } else {
               retval = serial_link_irq_chain(up);
               if (retval)
                       return retval;
       }

       /*
        * Now, initialize the UART
        */
       serial_outp(up, UART_LCR, UART_LCR_WLEN8);

-       spin_lock_irqsave(&up->port.lock, flags);
+       if (is_real_interrupt(up->port.irq)) {
+               spin_lock_irqsave(&irq_lists[up->port.irq].lock, flags);
+               spin_lock(&up->port.lock);
+       } else
+               spin_lock_irqsave(&up->port.lock, flags);
       if (up->port.flags & UPF_FOURPORT) {
               if (!is_real_interrupt(up->port.irq))
                       up->port.mctrl |= TIOCM_OUT1;
       } else
               /*
                * Most PC uarts need OUT2 raised to enable interrupts.
                */
               if (is_real_interrupt(up->port.irq))
                       up->port.mctrl |= TIOCM_OUT2;

       serial8250_set_mctrl(&up->port, up->port.mctrl);

       /*
        * Do a quick test to see if we receive an
        * interrupt when we enable the TX irq.
        */
       serial_outp(up, UART_IER, UART_IER_THRI);
       lsr = serial_in(up, UART_LSR);
       iir = serial_in(up, UART_IIR);
       serial_outp(up, UART_IER, 0);

       if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT) {
               if (!(up->bugs & UART_BUG_TXEN)) {
                       up->bugs |= UART_BUG_TXEN;
                       pr_debug("ttyS%d - enabling bad tx status workarounds\n",
                                port->line);
               }
       } else {
               up->bugs &= ~UART_BUG_TXEN;
       }

-       spin_unlock_irqrestore(&up->port.lock, flags);
+       if (is_real_interrupt(up->port.irq)) {
+               spin_unlock(&up->port.lock);
+               spin_unlock_irqrestore(&irq_lists[up->port.irq].lock, flags);
+       } else
+               spin_unlock_irqrestore(&up->port.lock, flags);

       /*
        * Finally, enable interrupts.  Note: Modem status interrupts
        * are set via set_termios(), which will be occurring imminently
        * anyway, so we don't enable them here.
        */
       up->ier = UART_IER_RLSI | UART_IER_RDI;
       serial_outp(up, UART_IER, up->ier);

       if (up->port.flags & UPF_FOURPORT) {
               unsigned int icp;
               /*
                * Enable interrupts on the AST Fourport board
                */
               icp = (up->port.iobase & 0xfe0) | 0x01f;
               outb_p(0x80, icp);






The second patch blocks the 16550A UART from asserting its IRQ during
the quick test in serial8250_startup previously discussed.  This patch
patch has only had limited testing at Stratus.

--- linux-2.6.24.4.old/drivers/serial/8250.c    2008-03-24
18:49:18.000000000 +0000
+++ linux-2.6.24.4.new/drivers/serial/8250.c    2008-04-16
19:40:41.000000000 +0000
@@ -1876,21 +1876,25 @@
                * Most PC uarts need OUT2 raised to enable interrupts.
                */
               if (is_real_interrupt(up->port.irq))
                       up->port.mctrl |= TIOCM_OUT2;

-       serial8250_set_mctrl(&up->port, up->port.mctrl);
+       /* Block IRQ to avoid false positive if SMP */
+       serial8250_set_mctrl(&up->port, 0);

       /*
        * Do a quick test to see if we receive an
        * interrupt when we enable the TX irq.
        */
       serial_outp(up, UART_IER, UART_IER_THRI);
       lsr = serial_in(up, UART_LSR);
       iir = serial_in(up, UART_IIR);
       serial_outp(up, UART_IER, 0);

+       /* Enable this UART to assert its IRQ */
+       serial8250_set_mctrl(&up->port, up->port.mctrl);
+
       if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT) {
               if (!(up->bugs & UART_BUG_TXEN)) {
                       up->bugs |= UART_BUG_TXEN;
                       pr_debug("ttyS%d - enabling bad tx status workarounds\n",
                                port->line);


Robert N. Evans
Software Engineer
STRATUS TECHNOLOGIES
111 Powdermill Road,
Maynard, MA 01754-3409  U.S.A.
--
To unsubscribe from this list: send the line "unsubscribe linux-serial" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux PPP]     [Linux FS]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Linmodem]     [Device Mapper]     [Linux Kernel for ARM]

  Powered by Linux