On Mon, Nov 04, 2024 at 12:01:19PM +0800, yangxingui wrote: (snip) > After testing, the issues we encountered were resolved. That is good news :) > > But the kernel prints the following log: > > [246993.392832] sas: Enter sas_scsi_recover_host busy: 1 failed: 1 > [246993.392839] sas: ata5: end_device-4:0: cmd error handler > [246993.392855] sas: ata5: end_device-4:0: dev error handler > [246993.392860] sas: ata6: end_device-4:3: dev error handler > [246993.392863] sas: ata7: end_device-4:4: dev error handler > [246993.606491] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: > 1 tries: 1 > > And because the current EH will set the host to the recovery state, > when we test and execute the smartctl command, it will affect the > performance of all other disks under the same host. > > Perhaps we can continue to improve the EH mechanism that Wenchao > tried to do before, and implement EH for a single disk. After a > single disk enters EH, it may not affect other disks under the same > host. > > https://lore.kernel.org/linux-scsi/20230901094127.2010873-1-haowenchao2@xxxxxxxxxx/ That is bad news :( Considering that this series will currently stall all other disks under the same host, this series is currently not a viable solution to the problem that you have reported (NCQ commands can starve out non-NCQ commands). Looking at: https://lore.kernel.org/linux-scsi/20230901094127.2010873-1-haowenchao2@xxxxxxxxxx/ It appears that a requirement for Wenchao's series to land, is that Hannes's EH rework series: https://lore.kernel.org/linux-scsi/20231023092837.33786-1-hare@xxxxxxx/ lands first. Unless these two SCSI series get merged first, it's illogical to carry this increased complexity in libata. If these two SCSI series ever get merged, then the series in $subject would be a viable solution to the problem, and the extra complexity would be justified. Kind regards, Niklas