seagate ST12000NM0007 disk issue

Eyal Lebedinsky <eyal@xxxxxxxxxxxxxx> · Fri, 5 Apr 2019 21:39:46 +1100

I posted this problem to the fedora users list first:
=============== original post
fedora 28 up-to-date.

I have a new array, not yet commissioned, and one disk now complains
    Device: /dev/sde [SAT], FAILED SMART self-check. BACK UP DATA NOW!
and the reason, I understand, is this
    ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
    200 Multi_Zone_Error_Rate   PO---K   001   001   001    NOW  0

My issue is that the disk works fine and never had any errors reported.
Also, smartmontools db does not include this disk
    ST12000NM0007-2A1101
and
    update-smart-drivedb
is failing with
    /usr/share/smartmontools/drivedb.h.error.raw: *** BAD signature ***

I know that some disks report funny numbers and smartmontools needs to know which
one to trust. I see a relevant ticket from 2018-06-15....
    https://www.smartmontools.org/attachment/ticket/1042/smartctl-SEAGATE-ST12000NM0007.txt
And I see that smartmontools has other issues with these disks:
    Write SCT (Get) Feature Control Command failed: scsi error badly formed scsi parameters
    Wt Cache Reorder: Unknown (SCT Feature Control command failed)

I ran some short and long smart tests but all failed immediately.
    SMART Extended Self-test Log Version: 1 (1 sectors)
    Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
    # 1  Extended offline    Completed: unknown failure    90%       250         0
    # 2  Short offline       Completed: unknown failure    90%       247         0
    # 3  Short offline       Completed: unknown failure    90%       247         0
    # 4  Extended offline    Completed without error       00%        17         -

The other (identical) disks in the array report
    ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
    200 Multi_Zone_Error_Rate   PO---K   100   100   001    -    0

I can request a replacement, but want to ensure that this is not just a smart software issue which
may pop up again on another disk.

Should we have smartmontools v7.0 (released end of last year) by now? It includes this disk.

As I mentioned, the array is not yet commissioned and I can afford to fiddle with it for a while.

TIA
=============== end original post

I tested with a current smartmontools-7.0.5 with same results.

I expect people on this list are now using these disks, and I wonder if anyone saw this kind of failure,
where the Multi_Zone_Error_Rate drops from 100 to 001 without any I/O error recorded and the disk still
apparently in working order.

I just ran fsck on the array (7 disk RAID6 which is 30% full - 18TB out of 60TB).
Completed without a problem. iostat shows all members with same number of transactions.

--
Eyal at Home (eyal@xxxxxxxxxxxxxx)