Re: Anyone else having Problems with lots of dying Seagate Exos X18 18TB Drives ?

Paul Mezzanini <pfmeec@xxxxxxx> · Wed, 7 Dec 2022 19:54:31 +0000

I started installing the SAS version of these drives two years ago in our cluster and I haven't had one fail yet.  I've been working on replacing every spinner we have with them.  I know it's not helping you figure out what is going on in your environment but hopefully a "the drive works for me" data point helps somehow.

-paul

________________________________________
From: Christoph Adomeit <Christoph.Adomeit@xxxxxxxxxxx>
Sent: Wednesday, December 7, 2022 9:16 AM
To: ceph-users@xxxxxxx
Subject:  Anyone else having Problems with lots of dying Seagate Exos X18 18TB Drives ?

Hi,

I am using Seagate Exos X18 18TB Drives in a Ceph Archives Cluster which is mainly
write once/read sometimes.

The drives are about 6 months old.

I use them in a ceph cluster and also in a ZFS Server. Different Servers
(all Supermicro) and different controllers but all of type LSI SAS3008

In the last weeks these drives are experiencing massive read errors and are
dying one after another.

The dmesg output looks like this:

[  418.546245] sd 0:0:35:0: [sdai] tag#1756 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=7s
[  418.548685] sd 0:0:35:0: [sdai] tag#1756 Sense Key : Medium Error [current]
[  418.549626] sd 0:0:35:0: [sdai] tag#1756 Add. Sense: Unrecovered read error
[  418.550507] sd 0:0:35:0: [sdai] tag#1756 CDB: Read(16) 88 00 00 00 00 00 00 00 08 00 00 00 00 20 00 00
[  418.552048] blk_update_request: critical medium error, dev sdai, sector 2048 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
[  420.514045] sd 0:0:35:0: [sdai] tag#1677 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=1s
[  420.518341] sd 0:0:35:0: [sdai] tag#1677 Sense Key : Medium Error [current]
[  420.520766] sd 0:0:35:0: [sdai] tag#1677 Add. Sense: Unrecovered read error
[  420.523222] sd 0:0:35:0: [sdai] tag#1677 CDB: Read(16) 88 00 00 00 00 00 00 00 08 00 00 00 00 08 00 00
[  420.524770] blk_update_request: critical medium error, dev sdai, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

On the ZFS Ouutput I could see the disks starting with a few read errors and then advancing
to some 1000 of errors.

Seagate told me to put the drives in a windows or apple computer or otherwise they cannot help.

Anyone else having such Disk problems or am i the only one ?

ZFS output:

        NAME                                   STATE     READ WRITE CKSUM
        tank                                   DEGRADED     0     0     0
          raidz2-0                             DEGRADED   218     0     0
            ata-ST18000NM000J-2TV103_ZR53Z7LE  DEGRADED     0     0 3.52K  too many errors
            ata-ST18000NM000J-2TV103_ZR53Z4VY  DEGRADED     0     0 3.52K  too many errors
            ata-ST18000NM000J-2TV103_ZR53Z56R  DEGRADED     0     0 3.52K  too many errors
            ata-ST18000NM000J-2TV103_ZR53YW1R  DEGRADED     0     0 3.52K  too many errors
            ata-ST18000NM000J-2TV103_ZR53YF19  DEGRADED     0     0 3.52K  too many errors
            ata-ST18000NM000J-2TV103_ZR53YLKX  DEGRADED     0     0 3.52K  too many errors
            ata-ST18000NM000J-2TV103_ZR53Z6P9  DEGRADED     0     0 3.52K  too many errors
            ata-ST18000NM000J-2TV103_ZR53Z773  DEGRADED     0     0 1.52K  too many errors
            ata-ST18000NM000J-2TV103_ZR53Y4ND  DEGRADED     0     0 1.52K  too many errors
            ata-ST18000NM000J-2TV103_ZR53YLSZ  DEGRADED     0     0 3.13K  too many errors
            ata-ST18000NM000J-2TV103_ZR53Z5VZ  DEGRADED     0     0 3.13K  too many errors

Any ideas ?

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx