On 9/3/22 01:19, John Garry wrote: > Hi Damien, > >>>> >>>> But the pm8001 manual and current driver indicate that the >>>> OPC_INB_SATA_ABORT command should be sent after read log ext when >>>> handling NCQ error, regardless of an autopsy. I send OPC_INB_SATA_ABORT >>>> in ata_eh_reset() -> pm8001_I_T_nexus_reset() -> pm8001_send_abort_all() >>> You lost me: ata_eh_recover() will call ata_eh_reset() only if the >>> ATA_EH_RESET >>> action flag is set. So are you saying that even though it is not >>> needed, you >>> still need to set ATA_EH_RESET for pm8001 ? >> >> As below, it was the only location I found suitable to call >> pm8001_send_abort_all(). >> >> However I am not really sure it is required now. For pm8001 NCQ error >> handling we require 2x steps: >> - read log ext >> - Send OPC_INB_SATA_ABORT - we do this in pm8001_send_abort_all() >> >> pm8001_send_abort_all() sends OPC_INB_SATA_ABORT in "device abort all" >> mode, meaning any IO in the HBA is aborted for the device. But we are >> also earlier in EH sending OPC_INB_SATA_ABORT for individual IOs in >> sas_eh_handle_sas_errors() -> sas_scsi_find_task() -> >> pm8001_abort_task() -> sas_execute_internal_abort_single() -> ... >> send_abort_task() >> >> So I don't think that the pm8001_send_abort_all() call has any effect, >> as we're already aborting any outstanding IO earlier. >> >> Admittedly the order of the 2x steps is different, but >> OPC_INB_SATA_ABORT does not send any protocol message to the disk, so >> would not affect anything subsequently read with read log ext. >> >> Having said all that, it may be wise to still send >> pm8001_send_abort_all().. > > Have you had a chance to consider all this yet? > > I am now starting to think that it is not necessary to call > pm8001_send_abort_all(), and instead rely only on > sas_eh_handle_sas_errors() -> sas_scsi_find_task() -> > pm8001_abort_task() -> sas_execute_internal_abort_single() -> ... -> > send_abort_task() to abort each outstanding IO. Then we would not have > the issue of forcing the reset in $subject (to lead to calling > pm8001_send_abort_all()). > > This idea could simply be tested by stubbing pm8001_send_abort_all() > (and dropping the |= ATA_EH_RESET in $subject). Send a re-spinned patch and I will test :) > > Thanks, > John > -- Damien Le Moal Western Digital Research