Tomasz,
Those machines are behind a surge protector. Doesn't appear to be a good one! I do have a UPS... but it is my fault... no battery. Power was pretty reliable for a while... and UPS was just beeping every chance it had, disrupting some sleep.. =P So running on surge protector only. I am running this in home environment. So far, HDD failures have been very rare for this environment. =) It just doesn't get loaded as much! I am not sure what to expect, seeing that "unfound" and just a feeling of possibility of maybe getting OSD back made me excited about it. =) Thanks for letting me know what should be the priority. I just lack experience and knowledge in this. =) Please do continue to guide me though this.
Thank you for the decode of that smart messages! I do agree that looks like it is on its way out. I would like to know how to get good portion of it back if possible. =)
I think I just set the size and min_size to 1.
# ceph osd lspools
0 data,1 metadata,2 rbd,
# ceph osd pool set rbd size 1
set pool 2 size to 1
# ceph osd pool set rbd min_size 1
set pool 2 min_size to 1
Seems to be doing some backfilling work.
# ceph health
HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 2 pgs backfill_toofull; 74 pgs backfill_wait; 3 pgs backfilling; 108 pgs degraded; 6 pgs down; 6 pgs inconsistent; 6 pgs peering; 7 pgs recovery_wait; 16 pgs stale; 108 pgs stuck degraded; 6 pgs stuck inactive; 16 pgs stuck stale; 130 pgs stuck unclean; 101 pgs stuck undersized; 101 pgs undersized; 1 requests are blocked > 32 sec; recovery 1790657/4502340 objects degraded (39.772%); recovery 641906/4502340 objects misplaced (14.257%); recovery 147/2251990 unfound (0.007%); 50 scrub errors; mds cluster is degraded; no legacy OSD present but 'sortbitwise' flag is not set
Regards,
Hong
On Monday, August 28, 2017 4:18 PM, Tomasz Kusmierz <tom.kusmierz@xxxxxxxxx> wrote:
So to decode few things about your disk:
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 37
37 read erros and only one sector marked as pending - fun disk :/
181 Program_Fail_Cnt_Total 0x0022 099 099 000 Old_age Always - 35325174
So firmware has quite few bugs, that’s nice
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 2855
disk was thrown around while operational even more nice.
194 Temperature_Celsius 0x0002 047 041 000 Old_age Always - 53 (Min/Max 15/59)
if your disk passes 50 you should not consider using it, high temperatures demagnetise plate layer and you will see more errors in very near future.
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 1
as mentioned before :)
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 4222
your heads keep missing tracks … bent ? I don’t even know how to comment here.
generally fun drive you’ve got there … rescue as much as you can and throw it away !!!
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com