Hello, On Wed, 3 Aug 2016 13:42:50 +0200 Jan Schermer wrote: > Christian, can you post your values for Power_Loss_Cap_Test on the drive which is failing? > Sure: --- 175 Power_Loss_Cap_Test 0x0033 001 001 010 Pre-fail Always FAILING_NOW 1 (47 942) --- Now according to the Intel data sheet that value of 1 means failed, NOT the actual buffer time it usually means, like this on the neighboring SSD: --- 175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 614 (47 944) --- And my 800GB DC S3610s have more than 10 times the endurance, my guess is a combo of larger cache and slower writes: --- 175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 8390 (22 7948) --- I'll definitely leave that "failing" SSD in place until it has done the next self-check. Christian > Thanks > Jan > > > On 03 Aug 2016, at 13:33, Christian Balzer <chibi@xxxxxxx> wrote: > > > > > > Hello, > > > > yeah, I was particular interested in the Power_Loss_Cap_Test bit, as it > > seemed to be such an odd thing to fail (given that's not single capacitor). > > > > As for your Reallocated_Sector_Ct, that's really odd and definitely a RMA > > worthy issue. > > > > For the record, Intel SSDs use (typically 24) sectors when doing firmware > > upgrades, so this is a totally healthy 3610. ^o^ > > --- > > 5 Reallocated_Sector_Ct 0x0032 099 099 000 Old_age Always - 24 > > --- > > > > Christian > > > > On Wed, 3 Aug 2016 13:12:53 +0200 Daniel Swarbrick wrote: > > > >> Right, I actually updated to smartmontools 6.5+svn4324, which now > >> properly supports this drive model. Some of the smart attr names have > >> changed, and make more sense now (and there are no more "Unknowns"): > >> > >> ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE > >> 5 Reallocated_Sector_Ct -O--CK 081 081 000 - 944 > >> 9 Power_On_Hours -O--CK 100 100 000 - 1067 > >> 12 Power_Cycle_Count -O--CK 100 100 000 - 7 > >> 170 Available_Reservd_Space PO--CK 085 085 010 - 0 > >> 171 Program_Fail_Count -O--CK 100 100 000 - 0 > >> 172 Erase_Fail_Count -O--CK 100 100 000 - 68 > >> 174 Unsafe_Shutdown_Count -O--CK 100 100 000 - 6 > >> 175 Power_Loss_Cap_Test PO--CK 100 100 010 - 6510 (4 4307) > >> 183 SATA_Downshift_Count -O--CK 100 100 000 - 0 > >> 184 End-to-End_Error PO--CK 100 100 090 - 0 > >> 187 Reported_Uncorrect -O--CK 100 100 000 - 0 > >> 190 Temperature_Case -O---K 070 065 000 - 30 (Min/Max > >> 25/35) > >> 192 Unsafe_Shutdown_Count -O--CK 100 100 000 - 6 > >> 194 Temperature_Internal -O---K 100 100 000 - 30 > >> 197 Current_Pending_Sector -O--C- 100 100 000 - 1100 > >> 199 CRC_Error_Count -OSRCK 100 100 000 - 0 > >> 225 Host_Writes_32MiB -O--CK 100 100 000 - 20135 > >> 226 Workld_Media_Wear_Indic -O--CK 100 100 000 - 20 > >> 227 Workld_Host_Reads_Perc -O--CK 100 100 000 - 82 > >> 228 Workload_Minutes -O--CK 100 100 000 - 64012 > >> 232 Available_Reservd_Space PO--CK 084 084 010 - 0 > >> 233 Media_Wearout_Indicator -O--CK 100 100 000 - 0 > >> 234 Thermal_Throttle -O--CK 100 100 000 - 0/0 > >> 241 Host_Writes_32MiB -O--CK 100 100 000 - 20135 > >> 242 Host_Reads_32MiB -O--CK 100 100 000 - 92945 > >> 243 NAND_Writes_32MiB -O--CK 100 100 000 - 95289 > >> > >> Reallocated_Sector_Ct is still increasing, but Available_Reservd_Space > >> seems to be holding steady. > >> > >> AFAIK, we've only had one other S3610 fail, and it seemed to be a sudden > >> death. The drive simply disappeared from the controller one day, and > >> could no longer be detected. > >> > >> On 03/08/16 12:15, Jan Schermer wrote: > >>> Make sure you are reading the right attribute and interpreting it right. > >>> update-smart-drivedb sometimes makes wonders :) > >>> > >>> I wonder what isdct tool would say the drive's life expectancy is with this workload? Are you really writing ~600TB/month?? > >>> > >>> Jan > >>> > >> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > > > > > -- > > Christian Balzer Network/Systems Engineer > > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > > http://www.gol.com/ > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com