I am working with Marvell 7042 controller and SiI3276 port multiplier [PMP] and would like to handle asynchronous notification [AN] properly. However, if a command is outstanding when the PMP raises an AN, the port is frozen, preventing _autopsy_ error code from doing its work. For example, here is a case where a disk has a power glitch behind a port multiplier while a command is outstanding. The PMP detects the signal loss and send an AN. In sata_mv.c mv_err_intr() is called and detect the notification: it pushes info in error descriptor and call ata_port_schedule_eh() via sata_async_notification(). However, when we enter ata_scsi_error(), if a command is outstanding, __ata_port_freeze() is called, preventing sata_scr_read() to succeed in ata_eh_link_autopsy(): Feb 25 02:11:57 bdfl11 kernel: ata4.00: failed to read SCR 1 (Emask=0x40) Feb 25 02:11:57 bdfl11 kernel: ata4.01: failed to read SCR 1 (Emask=0x40) Feb 25 02:11:57 bdfl11 kernel: ata4.02: failed to read SCR 1 (Emask=0x40) Feb 25 02:11:57 bdfl11 kernel: ata4.03: failed to read SCR 1 (Emask=0x40) Feb 25 02:11:57 bdfl11 kernel: ata4.04: failed to read SCR 1 (Emask=0x40) Feb 25 02:11:57 bdfl11 kernel: ata4.05: failed to read SCR 1 (Emask=0x40) Feb 25 02:11:57 bdfl11 kernel: ata4.15: exception Emask 0x4 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 25 02:11:57 bdfl11 kernel: ata4.15: edma_err_cause=02000100 pp_flags=00000005, fis_cause=00008200 Feb 25 02:11:57 bdfl11 kernel: ata4.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 25 02:11:57 bdfl11 kernel: ata4.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 25 02:11:57 bdfl11 kernel: ata4.02: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 25 02:11:57 bdfl11 kernel: ata4.03: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 25 02:11:57 bdfl11 kernel: ata4.04: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 25 02:11:57 bdfl11 kernel: ata4.04: cmd ca/00:80:e7:78:56/00:00:00:00:00/e8 tag 3 dma 65536 out Feb 25 02:11:57 bdfl11 kernel: res 50/00:00:4e:10:45/00:00:00:00:00/e8 Emask 0x4 (timeout) Feb 25 02:11:57 bdfl11 kernel: ata4.04: status: { DRDY } Feb 25 02:11:57 bdfl11 kernel: ata4.05: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen Feb 25 02:11:57 bdfl11 kernel: ata4.15: hard resetting link Feb 25 02:11:58 bdfl11 kernel: ata4.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Feb 25 02:11:58 bdfl11 kernel: ata4.00: hard resetting link I haven't found the right solution to handle this problem yet: 1: removing __ata_port_freeze() in ata_scsi_error() unilaterally is very dangerous, it opens a new race condition and may schedule the error handler several time. 2: in sata_mv, we can not wait for commands to complete like we do for NCQ, because in the case above, the command sent to the failed disk will never come back. I am thinking of waiting for all IO to complete on all port but the impacted one(s), adding a new action in ehi descriptor to indicate an AN is scheduled, and preventing the error to froze the port if only IOs to the failed ports are outstanding. Then _autopsy_ code would collect and decode SERROR register for the failed port. Is it the right approach? Thanks, Gwendal. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html