16 osds: 11 up, 16 in

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/7/14 13:40 , Sergey Malinin wrote:
> Check dmesg and SMART data on both nodes. This behaviour is similar to 
> failing hdd.
>
>

It does sound like a failing disk... but there's nothing in dmesg, and 
smartmontools hasn't emailed me about a failing disk.  The same thing is 
happening to more than 50% of my OSDs, in both nodes.



smartctl for osd.5 says:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      
UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000b   100   100   016 Pre-fail  
Always       -       0
   2 Throughput_Performance  0x0005   136   136   054 Pre-fail  
Offline      -       81
   3 Spin_Up_Time            0x0007   100   100   024 Pre-fail  
Always       -       606
   4 Start_Stop_Count        0x0012   100   100   000 Old_age   
Always       -       5
   5 Reallocated_Sector_Ct   0x0033   100   100   005 Pre-fail  
Always       -       0
   7 Seek_Error_Rate         0x000b   100   100   067 Pre-fail  
Always       -       0
   8 Seek_Time_Performance   0x0005   119   119   020 Pre-fail  
Offline      -       35
   9 Power_On_Hours          0x0012   100   100   000 Old_age   
Always       -       4028
  10 Spin_Retry_Count        0x0013   100   100   060 Pre-fail  
Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   000 Old_age   
Always       -       5
192 Power-Off_Retract_Count 0x0032   100   100   000 Old_age   
Always       -       166
193 Load_Cycle_Count        0x0012   100   100   000 Old_age   
Always       -       166
194 Temperature_Celsius     0x0002   166   166   000 Old_age   
Always       -       36 (Min/Max 21/39)
196 Reallocated_Event_Count 0x0032   100   100   000 Old_age   
Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000 Old_age   
Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000 Old_age   
Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000 Old_age   
Always       -       0

The weekly scheduled tests have all completed successfully:
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining 
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error 00%      3922         -
# 2  Short offline       Completed without error 00%      3754         -
...



-- 

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email clewis at centraldesktop.com <mailto:clewis at centraldesktop.com>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/>  | Twitter 
<http://www.twitter.com/centraldesktop>  | Facebook 
<http://www.facebook.com/CentralDesktop>  | LinkedIn 
<http://www.linkedin.com/groups?gid=147417>  | Blog 
<http://cdblog.centraldesktop.com/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140507/6f117095/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux