Hello.. Been chasing an issue with ATA link drop-outs and wanted to run this by some SMEs. System Information Distro: AlmaLinux 8.8 Kernel: 4.18.0-477.13.1 Arch: x64 OpenZFS Version: 2.1.5-1 The dropouts are occurring with SSD drives that are attached to Marvell 88SE9235 SATA controllers via Marvell 88SM9705 port multipliers. The SSD drives are M.2 form factor and are typically models from WD or SanDisk. When the issue occurs, communication with all SSD drives (5) connected to port multiplier is lost and the driver performs recovery steps in order to re-establish connection with the SSD drives. This results in ZFS I/O errors being reported from zpool status. Multiple events with unsuccessful recovery steps by driver can lead to pool suspension. The issue occurs with both small and large I/O workloads, though usually takes longer to manifest with small I/O workload. The issue DOES NOT occur with older version of CentOS and ZFS running on same hardware. System Information Distribution: CentOS 7.9 Kernel Version: 3.10.0-1160.15.2 Architecture: x64 OpenZFS Version: 0.8.6-1 Have tried the following, in different combinations but issue still occurs: Disabling NCQ Lowering SATA speed to 3.0 Upgrading ZFS to 2.1.13 Upgrading to AlmaLinux 8.9 Changing SATA power management from max_performance -> medium_power Changing I/O scheduler from None -> mq-deadline Change max_sectors_kb -> 512 The issue can be reproduced as follows: Small I/O workload: Boot-up system w/ apps that generate small sustained I/O load on the ZFS pool and let it run w/o interaction Large I/O workload: Use fio to generate heavy I/O workload on ZFS pool Partial snippet from syslog that shows initial messages when drop-outs occur: Dec 17 07:41:00.384 test01 kernel: ata7.00: failed to read SCR 1 (Emask=0x40) Dec 17 07:41:00.384 test01 kernel: ata7.01: failed to read SCR 1 (Emask=0x40) Dec 17 07:41:00.384 test01 kernel: ata7.02: failed to read SCR 1 (Emask=0x40) Dec 17 07:41:00.384 test01 kernel: ata7.03: failed to read SCR 1 (Emask=0x40) Dec 17 07:41:00.384 test01 kernel: ata7.04: failed to read SCR 1 (Emask=0x40) Dec 17 07:41:00.384 test01 kernel: ata7.00: exception Emask 0x100 SAct 0x4200000 SErr 0x0 action 0x6 frozen Dec 17 07:41:00.384 test01 kernel: ata7.00: failed command: WRITE FPDMA QUEUED Dec 17 07:41:00.384 test01 kernel: ata7.00: cmd 61/0b:a8:da:66:d1/00:00:08:00:00/40 tag 21 ncq dma 5632 out res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 17 07:41:00.384 test01 kernel: ata7.00: status: { DRDY } Dec 17 07:41:00.384 test01 kernel: ata7.00: failed command: WRITE FPDMA QUEUED Dec 17 07:41:00.384 test01 kernel: ata7.00: cmd 61/15:d0:28:26:fe/00:00:06:00:00/40 tag 26 ncq dma 10752 out res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 17 07:41:00.384 test01 kernel: ata7.00: status: { DRDY } Any input on this would be greatly appreciated!