Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> writes: > Hi Gabriel, > > On 11/11/2015 09:18 AM, Peter Hurley wrote: >> On 11/11/2015 08:15 AM, Gabriel Krisman Bertazi wrote: >>> When a PCI error is detected, reset code in 8250_pci.c suspends the uart >>> ports to block IO while recovery procedure is performed. Nevertheless, >>> TIOCMGET and TIOCMSET ioctls are still able to reach the device even >>> when it is suspended, which might cause EEH recovery of serial adapters >>> to fail on Power systems. This patch blocks any TIOCMGET/TIOCMSET calls >>> from reaching the device if the port is suspended, to avoid racing with >>> the recovery procedure. >> >> This really should be handled by the 8250_pci sub-driver ticking >> TTY_IO_ERROR so that almost all tty operations fail (a minor bug fix >> to properly order the clearing of ASYNC_INITIALIZED w/ setting TTY_IO_ERROR >> in uart_shutdown() would be req'd). >> >> Unfortunately the machinery to do that doesn't exist. >> I could trivially add a means given a uart_port ptr but the difficulty >> of _safely_ getting a uart_port from line index still remains; >> serial8250_get_port() should have just added a full ref count mechanism. > > Re-checking the 8250_pci sub-driver, I see it goes right through > serial8250_suspend_port() which assumes the 8250 port will not be unregistered > concurrently, so I assume the PCI subsystem serializes i/o error handling > with device removal, in which case the translation from > line => uart_8250_port => uart_port is straightforward. > > Below is the serial core machinery that can be used to tick TTY_IO_ERROR. Sorry for the late response. I've been testing both solutions for a while now and they still doesn't seem to do a good job on stopping access to a device being recovered. Ticking TTY_IO_ERROR caused my test program to reopen the port, which went straight to serial8250_do_startup, accessed the device and crashed the recovery again. I fixed that, but other places also access the device even though TTY_IO_ERROR is set. Since uart_startup/uart_shutdown sets TTY_IO_ERROR as well, we'd need to check if we're already in an error condition during startup/shutdown, and abort initialization/shutdown. Considering all of that, I'm starting to second guess the idea of using TTY_IO_ERROR for this purpose. I'm considering another fix that calls pciserial_remove_ports when detecting a PCI error, which would eventually call uart_remove_one_port, and actually detach the port from the serial layer. Then, I can register the port again after recovery, similarly to what is done in pciserial_init_ports during probe. This is how other serial driver (jsm) handles PCI errors. Anyway, I'm gonna have to investigate a little further, but it would be nice to hear from you if I should insist on TTY_IO_ERROR or move to removing the port entirely instead of suspending. Thanks! -- Gabriel Krisman Bertazi -- To unsubscribe from this list: send the line "unsubscribe linux-serial" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html