Nvme m.2 disk problem

Alessandro Baggi <alessandro.baggi@xxxxxxxxx> · Sun, 24 Feb 2019 11:08:52 +0100

Hi list,
I'm running Centos 7.6 on an Corsair Force MP500 120 GB. Root fs is ext4 
and this drive is ~1 year old.
System works very well except on boot.
During boot process I got always a file system check on nvme drive.

Running smartctl on this drive I got this:

=== START OF SMART DATA SECTION === 

SMART overall-health self-assessment test result: PASSED 

SMART/Health Information (NVMe Log 0x02, NSID 0x1) 

Critical Warning:                   0x00 

Temperature:                        41 Celsius 

Available Spare:                    100% 

Available Spare Threshold:          1% 

Percentage Used:                    1% 

Data Units Read:                    5,355,595 [2,74 TB] 

Data Units Written:                 5,826,517 [2,98 TB] 

Host Read Commands:                 67,978,550 

Host Write Commands:                75,422,898 

Controller Busy Time:               32,863 

Power Cycles:                       811 

Power On Hours:                     2,813
Unsafe Shutdowns:                   317
Media and Data Integrity Errors:    0
Error Information Log Entries:      177
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 2:               77 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0        177     0  0x0014  0x4004      - 8796109799680     1     -
  1        176     0  0x0019  0x4004      - 8796109799680     1     -
  2        175     0  0x001a  0x4004      - 8796109799680     1     -
  3        174     0  0x0005  0x4004      - 8796109799680     1     -
  4        173     0  0x000c  0x4004      - 8796109799680     1     -
  5        172     0  0x0019  0x4004      - 8796109799680     1     -
  6        171     0  0x001d  0x4004      - 8796109799680     1     -
  7        170     0  0x0014  0x4004      - 8796109799680     1     -
  8        169     0  0x0011  0x4004      - 8796109799680     1     -
  9        168     0  0x000f  0x4004      - 8796109799680     1     -
 10        167     0  0x0000  0x4004      - 8796109799680     1     -
 11        166     0  0x0006  0x4004      - 8796109799680     1     -
 12        165     0  0x0008  0x4004      - 8796109799680     1     -
 13        164     0  0x000e  0x4004      - 8796109799680     1     -
 14        163     0  0x0008  0x4004      - 8796109799680     1     -
 15        162     0  0x0006  0x4004      - 8796109799680     1     -
... (48 entries not shown)

I noticed that Unsafe shutdowns increased rapidly and I don't know why 
there is an unsafe shutdown. Every 3/4 boot this value is increased by 1 
and I don't know why.

I can't find any errors on system logs.

Can someone point me in the right direction?

Thanks in advance.

Alessandro.
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
https://lists.centos.org/mailman/listinfo/centos