Re: [PATCH] serial: Fail TIOCMGET/SET if port is suspended

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> writes:

> Hi Gabriel,
>
> On 11/11/2015 09:18 AM, Peter Hurley wrote:
>> On 11/11/2015 08:15 AM, Gabriel Krisman Bertazi wrote:
>>> When a PCI error is detected, reset code in 8250_pci.c suspends the uart
>>> ports to block IO while recovery procedure is performed.  Nevertheless,
>>> TIOCMGET and TIOCMSET ioctls are still able to reach the device even
>>> when it is suspended, which might cause EEH recovery of serial adapters
>>> to fail on Power systems.  This patch blocks any TIOCMGET/TIOCMSET calls
>>> from reaching the device if the port is suspended, to avoid racing with
>>> the recovery procedure.
>> 
>> This really should be handled by the 8250_pci sub-driver ticking
>> TTY_IO_ERROR so that almost all tty operations fail (a minor bug fix
>> to properly order the clearing of ASYNC_INITIALIZED w/ setting TTY_IO_ERROR
>> in uart_shutdown() would be req'd).
>> 
>> Unfortunately the machinery to do that doesn't exist.
>> I could trivially add a means given a uart_port ptr but the difficulty
>> of _safely_ getting a uart_port from line index still remains;
>> serial8250_get_port() should have just added a full ref count mechanism.
>
> Re-checking the 8250_pci sub-driver, I see it goes right through
> serial8250_suspend_port() which assumes the 8250 port will not be unregistered
> concurrently, so I assume the PCI subsystem serializes i/o error handling
> with device removal, in which case the translation from
> line => uart_8250_port => uart_port is straightforward.
>
> Below is the serial core machinery that can be used to tick TTY_IO_ERROR.

Sorry for the late response.

I've been testing both solutions for a while now and they still doesn't
seem to do a good job on stopping access to a device being recovered.
Ticking TTY_IO_ERROR caused my test program to reopen the port, which
went straight to serial8250_do_startup, accessed the device and crashed
the recovery again.  I fixed that, but other places also access the
device even though TTY_IO_ERROR is set.  Since
uart_startup/uart_shutdown sets TTY_IO_ERROR as well, we'd need to check
if we're already in an error condition during startup/shutdown, and
abort initialization/shutdown.  Considering all of that, I'm starting to
second guess the idea of using TTY_IO_ERROR for this purpose.

I'm considering another fix that calls pciserial_remove_ports when
detecting a PCI error, which would eventually call uart_remove_one_port,
and actually detach the port from the serial layer.  Then, I can
register the port again after recovery, similarly to what is done in
pciserial_init_ports during probe.  This is how other serial driver
(jsm) handles PCI errors.

Anyway, I'm gonna have to investigate a little further, but it would be
nice to hear from you if I should insist on TTY_IO_ERROR or move to
removing the port entirely instead of suspending.

Thanks!

-- 
Gabriel Krisman Bertazi

--
To unsubscribe from this list: send the line "unsubscribe linux-serial" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux PPP]     [Linux FS]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Linmodem]     [Device Mapper]     [Linux Kernel for ARM]

  Powered by Linux