On Tue, Oct 23, 2012 at 3:01 AM, Alan Cox <alan@xxxxxxxxxxxxxxxxxxx> wrote: > > > Added module parameter skip_rdi_check to opt out this workaround. > > NAK. Anything like this should be runtime. One can echo 1 (or 0) > /sys/modules/8250/parameters/skip_rdi_check during run time to turn it off (or on) dynamically. Does it count as runtime? > > Tested on Radisys ATCA 46XX which uses FPGA 16550-compatible and > > other generic 16550 UART. It takes from an hour to days to reproduce by > > pumping inputs to serial console continously using TeraTerm script: > > You turn this on by default but it's a nasty IRQ latency penalty > on a lot of x86 platforms with the uarts on the lpc bus. I agree. Will this patch be more acceptable if default is off? I can't narrow it hardware down since it is all generic UART., > What I am not clear on from this is > > - do you see it on both the ports (the bug that is) No, each hardware only has one serial console port that has traffic, and only one of the two symptom occur on one type of hardware. That is hardware 1 ttyS0 has "too much work for irq", and hardware 2 ttyS0 has console freeze under a separate test. I group them together since they occur using the same console flooding test script and under similar RDI root cause. > - if you do see it on both are you sure its not in reality a symptom of > some other console/irq handling race ? It is racing. For "too much work for irq", here is sequence events analyzed by a Motorola engineer: 1) Data arrives in the FIFO, but not enough to cause an interrupt 2) The transmitter is started. 3) A transmit needs data interrupt occurs (0xC2 in the IIR) 4) The processing function is called and it reads the LSR 5) The LSR indicates that the transmitter needs data, but also indicates the presence of data in the FIFO (0x61 in the LSR) 6) The processing function receives the characters, and outputs data to the FIFO 7) At the exact time (very very small window) that the character is read from the FIFO, the FIFO timeout occurs locking in an interrupt cause 8) The next loop through the interrupt code begins 9) The IIR now indicates the data timeout interrupt (0xCC in the IIR) 10) The processing function is called and it reads the LSR 11) The LSR is 0 indicating nothing to do 12) The interrupt loop continues (the IIR won't clear until a character is pulled) until it reaches its max count and displays the error. The other console freeze symptom is caused by similar sequence. The last interrupt before interrupt stops always shows IIR=0xC2 and LSR=0x21, which means has transmit interrupt but both transmit and receive status. After interrupt stops, i insmod a module to force read: IIR=0xC6, IER=0x0F, still no interrupt. Then I read LSR=0xE3., which is what the next interrupt would have done, makes interrupt resume again. Instead of force reading LSR, I can also resume interrupt by forcing a printk, which triggers a new transmit interrupt that reads LSR anyway. > > Alan -- To unsubscribe from this list: send the line "unsubscribe linux-serial" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html