Anyone else having Problems with lots of dying Seagate Exos X18 18TB Drives ?

Christoph Adomeit <Christoph.Adomeit@xxxxxxxxxxx> · Wed, 7 Dec 2022 15:16:27 +0100

Hi,

I am using Seagate Exos X18 18TB Drives in a Ceph Archives Cluster which is mainly
write once/read sometimes.

The drives are about 6 months old.

I use them in a ceph cluster and also in a ZFS Server. Different Servers
(all Supermicro) and different controllers but all of type LSI SAS3008

In the last weeks these drives are experiencing massive read errors and are
dying one after another.

The dmesg output looks like this:

[  418.546245] sd 0:0:35:0: [sdai] tag#1756 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=7s
[  418.548685] sd 0:0:35:0: [sdai] tag#1756 Sense Key : Medium Error [current]
[  418.549626] sd 0:0:35:0: [sdai] tag#1756 Add. Sense: Unrecovered read error
[  418.550507] sd 0:0:35:0: [sdai] tag#1756 CDB: Read(16) 88 00 00 00 00 00 00 00 08 00 00 00 00 20 00 00
[  418.552048] blk_update_request: critical medium error, dev sdai, sector 2048 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
[  420.514045] sd 0:0:35:0: [sdai] tag#1677 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=1s
[  420.518341] sd 0:0:35:0: [sdai] tag#1677 Sense Key : Medium Error [current]
[  420.520766] sd 0:0:35:0: [sdai] tag#1677 Add. Sense: Unrecovered read error
[  420.523222] sd 0:0:35:0: [sdai] tag#1677 CDB: Read(16) 88 00 00 00 00 00 00 00 08 00 00 00 00 08 00 00
[  420.524770] blk_update_request: critical medium error, dev sdai, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

On the ZFS Ouutput I could see the disks starting with a few read errors and then advancing 
to some 1000 of errors.

Seagate told me to put the drives in a windows or apple computer or otherwise they cannot help.

Anyone else having such Disk problems or am i the only one ?

ZFS output:

	NAME                                   STATE     READ WRITE CKSUM
	tank                                   DEGRADED     0     0     0
	  raidz2-0                             DEGRADED   218     0     0
	    ata-ST18000NM000J-2TV103_ZR53Z7LE  DEGRADED     0     0 3.52K  too many errors
	    ata-ST18000NM000J-2TV103_ZR53Z4VY  DEGRADED     0     0 3.52K  too many errors
	    ata-ST18000NM000J-2TV103_ZR53Z56R  DEGRADED     0     0 3.52K  too many errors
	    ata-ST18000NM000J-2TV103_ZR53YW1R  DEGRADED     0     0 3.52K  too many errors
	    ata-ST18000NM000J-2TV103_ZR53YF19  DEGRADED     0     0 3.52K  too many errors
	    ata-ST18000NM000J-2TV103_ZR53YLKX  DEGRADED     0     0 3.52K  too many errors
	    ata-ST18000NM000J-2TV103_ZR53Z6P9  DEGRADED     0     0 3.52K  too many errors
	    ata-ST18000NM000J-2TV103_ZR53Z773  DEGRADED     0     0 1.52K  too many errors
	    ata-ST18000NM000J-2TV103_ZR53Y4ND  DEGRADED     0     0 1.52K  too many errors
	    ata-ST18000NM000J-2TV103_ZR53YLSZ  DEGRADED     0     0 3.13K  too many errors
	    ata-ST18000NM000J-2TV103_ZR53Z5VZ  DEGRADED     0     0 3.13K  too many errors

Any ideas ?

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx