Hi, I haven’t had problems with Power_Loss_Cap_Test so far. Regarding Reallocated_Sector_Ct (SMART ID: 5/05h), you can check the “Available Reserved Space” (SMART ID: 232/E8h), the data sheet (http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/ssd-dc-s3610-spec.pdf) reads: "This attribute reports the number of reserve blocks remaining. The normalized value begins at 100 (64h), which corresponds to 100 percent availability of the reserved space. The threshold value for this attribute is 10 percent availability." According to the SMART data you copied, it should be about 84% of the over provisioning left? Since the drive is pretty young, it might be some form of defect? I have a number of S3610 with ~150 DW, all SMART counters are their initial values (except for the temperature). Cheers, Maxime On 03/08/16 11:12, "ceph-users on behalf of Daniel Swarbrick" <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of daniel.swarbrick@xxxxxxxxxxxxxxxx> wrote: >Hi Christian, > >Intel drives are good, but apparently not infallible. I'm watching a DC >S3610 480GB die from reallocated sectors. > >ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE > 5 Reallocated_Sector_Ct -O--CK 081 081 000 - 756 > 9 Power_On_Hours -O--CK 100 100 000 - 1065 > 12 Power_Cycle_Count -O--CK 100 100 000 - 7 >175 Program_Fail_Count_Chip PO--CK 100 100 010 - 17454078318 >183 Runtime_Bad_Block -O--CK 100 100 000 - 0 >184 End-to-End_Error PO--CK 100 100 090 - 0 >187 Reported_Uncorrect -O--CK 100 100 000 - 0 >190 Airflow_Temperature_Cel -O---K 070 065 000 - 30 (Min/Max >25/35) >192 Power-Off_Retract_Count -O--CK 100 100 000 - 6 >194 Temperature_Celsius -O---K 100 100 000 - 30 >197 Current_Pending_Sector -O--C- 100 100 000 - 1288 >199 UDMA_CRC_Error_Count -OSRCK 100 100 000 - 0 >228 Power-off_Retract_Count -O--CK 100 100 000 - 63889 >232 Available_Reservd_Space PO--CK 084 084 010 - 0 >233 Media_Wearout_Indicator -O--CK 100 100 000 - 0 >241 Total_LBAs_Written -O--CK 100 100 000 - 20131 >242 Total_LBAs_Read -O--CK 100 100 000 - 92945 > >The Reallocated_Sector_Ct is increasing about once a minute. I'm not >sure how many reserved sectors the drive has, i.e., how soon before it >starts throwing write IO errors. > >It's a very young drive, with only 1065 hours on the clock, and has not >even done two full drive-writes: > >Device Statistics (GP Log 0x04) >Page Offset Size Value Description > 1 ===== = = == General Statistics (rev 2) == > 1 0x008 4 7 Lifetime Power-On Resets > 1 0x018 6 1319318736 Logical Sectors Written > 1 0x020 6 137121729 Number of Write Commands > 1 0x028 6 6091245600 Logical Sectors Read > 1 0x030 6 115252407 Number of Read Commands > >Fortunately this drive is not used as a Ceph journal. It's in a mdraid >RAID5 array :-| > >Cheers, >Daniel > >On 03/08/16 07:45, Christian Balzer wrote: >> >> Hello, >> >> not a Ceph specific issue, but this is probably the largest sample size of >> SSD users I'm familiar with. ^o^ >> >> This morning I was woken at 4:30 by Nagios, one of our Ceph nodes having a >> religious experience. >> >> It turns out that the SMART check plugin I run to mostly get an early >> wearout warning detected a "Power_Loss_Cap_Test" failure in one of the >> 200GB DC S3700 used for journals. >> >> While SMART is of the opinion that this drive is failing and will explode >> spectacularly any moment that particular failure is of little worries to >> me, never mind that I'll eventually replace this unit. >> >> What brings me here is that this is the first time in over 3 years that an >> Intel SSD has shown a (harmless in this case) problem, so I'm wondering if >> this particular failure has been seen by others. >> >> That of course entails people actually monitoring for these things. ^o^ >> >> Thanks, >> >> Christian >> > > >_______________________________________________ >ceph-users mailing list >ceph-users@xxxxxxxxxxxxxx >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com