Last year, I sent a couple of patches trying to improve the xilinx_uartps driver. Now I was allocated to work on this driver again. Unfortunately, I don't have ready to merge patches at this point. The original symptom that made me investigate the xilinx_uartps driver was occasional kernel hangs and garbage coming out of the console. While trying to reproduce the issue on a current kernel, I no longer encountered any hangs, but I was able to reproduce the garbage. The driver has an interrupt-driven mode (used when opening the console device) and a polling mode (used by kmsg). The issue is easily reproducible by rapidly writing to /dev/kmsg when having commit e3538c37ee38 ("tty: xuartps: Beautify read-modify writes") reverted and nothing has opened the console. In that case, the transmitter is initially disabled and cdns_uart_console_write disables the transmitter after writing (due to the reversion). Without the reversion, it still is reproducible much less reliably by writing to kmsg and then opening the port (which temporarily disables the transmitter in cdns_uart_startup for reconfiguration). From all that I have seen, it seems that disabling the transmitter too early after writing data to the fifo is the cause. Whatever "too early" means. On a typical remote USB uart, the symptom looks as if 1 to 6 characters before finishing a cdns_uart_console_write (e.g. writing a line to kmsg), garbage characters (often with high bits) are received. This continues for 10 to 80 characters into the next write. Upon inspection with an oscilloscope, there is a different picture. At the start of where the USB uart sees garbage, a character takes slightly longer than expected. After the erroneous character, transmission continues normally. It seems that the USB uart temporarily desynchronizes and reads overlapping characters, which I perceive as garbage. In 2016, I sent a patch[1] for waiting longer after a cdns_uart_console_write. It not only waits for the transmitter fifo to empty (as the current driver does), but also waits for the TACTIVE bit in the channel status register to clear. The current driver does not look at the TACTIVE bit at all. Unfortunately, I am still able to rarely reproduce the symptoms with this patch applied. It seems as if the device is still not finished transmitting even after indicating TXEMPTY and clearing TACTIVE. Increasing the wait beyond makes the symptoms disappear entirely. Figuring how long to wait is the difficult part though. It seems that after TXEMPTY is indicated, waiting three times as long as it takes to clear TACTIVE is sufficient at a baud rate of 115200. What is a good way to move on from here? I can send a patch that increases the wait time using the aforementioned heuristic, but that still doesn't tell at all why the issue happens and how long one actually needs to wait. Any ideas? I also sent a patch[2] that fixes a typo in cdns_uart_startup, that causes the transmitter to be disabled. It still is applicable today and I think it should be applied. I also proposed making the logic for enabling/disabling the transmitter more consistent (after commit e3538c37ee38 changed it in a non-obvious way). In a discussing with Sören Brinkmann, we agreed that the transmitter should be remain enabled after a write. Cleaning up other logic to be consistent with that decision is what another earlier patch[3] does. Should I try resending any of these patches? Helmut [1] http://marc.info/?l=linux-serial&m=146977837419015 [2] http://marc.info/?l=linux-serial&m=146977916219251 [3] http://marc.info/?l=linux-serial&m=147012929717986 (includes patch [2]) -- To unsubscribe from this list: send the line "unsubscribe linux-serial" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html