Re: Power outages!!! help!

hjcho616 <hjcho616@xxxxxxxxx> · Wed, 30 Aug 2017 05:23:01 +0000 (UTC)

This is what it looks like today.  Seems like ceph-osds are sitting at 0% cpu so... all the migrations appear to be done,  Does this look ok to shutdown and continue when I get the HDD on Thursday?

# ceph health
HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 20 pgs backfill_wait; 23 pgs degraded; 6 pgs down; 2 pgs inconsistent; 6 pgs peering; 4 pgs recovering; 3 pgs recovery_wait; 16 pgs stale; 23 pgs stuck degraded; 6 pgs stuck inactive; 16 pgs stuck stale; 49 pgs stuck unclean; 16 pgs stuck undersized; 16 pgs undersized; 1 requests are blocked > 32 sec; recovery 221870/2473686 objects degraded (8.969%); recovery 365398/2473686 objects misplaced (14.771%); recovery 147/2251990 unfound (0.007%); 7 scrub errors; mds cluster is degraded; no legacy OSD present but 'sortbitwise' flag is not set

# df
Filesystem      1K-blocks      Used  Available Use% Mounted on
udev                10240         0      10240   0% /dev
tmpfs             1584780      9212    1575568   1% /run
/dev/sda1        15247760   9610208    4839960  67% /
tmpfs             3961940         0    3961940   0% /dev/shm
tmpfs                5120         0       5120   0% /run/lock
tmpfs             3961940         0    3961940   0% /sys/fs/cgroup
/dev/sdd1      1952559676 712028032 1240531644  37% /var/lib/ceph/osd/ceph-2
/dev/sde1      1952559676 628862040 1323697636  33% /var/lib/ceph/osd/ceph-6
/dev/sdc1      1952559676 755815036 1196744640  39% /var/lib/ceph/osd/ceph-1
/dev/sdf1       312417560  42551928  269865632  14% /var/lib/ceph/osd/ceph-7
tmpfs              792392         0     792392   0% /run/user/0

I'm not sure if I am liking what I see on fdisk... it doesn't show sdb1.  I hope it shows up when I run dd_rescue to other drive... =P

# fdisk /dev/sdb

Welcome to fdisk (util-linux 2.25.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

/dev/sdb: device contains a valid 'xfs' signature, it's strongly recommended to wipe the device by command wipefs(8) if this setup is unexpected to avoid possible collisions.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xe684adb6.

Command (m for help): p
Disk /dev/sdb: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xe684adb6

Command (m for help):

    On Tuesday, August 29, 2017 3:29 PM, Tomasz Kusmierz <tom.kusmierz@xxxxxxxxx> wrote:

 Maged, on second host he has 4 out of 5 OSD failed on him … I think he’s past the trying to increase the backfill threshold :) ofcourse he could try to degrade cluster by letting mirror within same host :) 
On 29 Aug 2017, at 21:26, Maged Mokhtar <mmokhtar@xxxxxxxxxxx> wrote:

One of the things to watch out in small clusters is OSDs can get full rather unexpectedly in recovery/backfill cases:
In your case you have 2 OSD nodes with 5 disks each. Since you have a replica of 2, each PG will have 1 copy on each host, so if an OSD fails, all its PGs will have to be re-created on the same host, meaning they will be distributed only among the 4 OSDs on the same host, which will quickly bump their usage by nearly 20% each.
the default osd_backfill_full_ratio is 85% so if any of the 4 OSDs was near 70% util before the failure, it will easily reach 85% and cause the cluster to error with backfill_toofull message you see.  This is why i suggest you add an extra disk or try your luck reasing osd_backfill_full_ratio to 92% it may fix things.
/Maged
On 2017-08-29 21:13, hjcho616 wrote:

Nice!  Thank you for the explanation!  I feel like I can revive that OSD. =)  That does sound great.  I don't quite have another cluster so waiting for a drive to arrive! =)  

After setting min and max_min to 1, looks like toofull flag is gone... Maybe when I was making that video copy OSDs were already down... and those two OSDs were not enough to take too much extra...  and on top of it that last OSD alive was smaller disk (2TB vs 320GB)... so it probably was filling up faster.  I should have captured that message... but turned machine off and now I am at work. =P  When I get back home, I'll try to grab that and share.  Maybe I don't need to try to add another OSD to that cluster just yet!  OSDs are about 50% full on OSD1.

So next up, fixing osd0!

Regards,
Hong  

 On Tuesday, August 29, 2017 1:05 PM, David Turner <drakonstein@xxxxxxxxx> wrote:

But it was absolutely awesome to run an osd off of an rbd after the disk failed.

On Tue, Aug 29, 2017, 1:42 PM David Turner <drakonstein@xxxxxxxxx> wrote:

To addend Steve's success, the rbd was created in a second cluster in the same datacenter so it didn't run the risk of deadlocking that mapping rbds on machines running osds has.  It is still theoretical to work on the same cluster, but more inherently dangerous for a few reasons.

On Tue, Aug 29, 2017, 1:15 PM Steve Taylor <steve.taylor@xxxxxxxxxxxxxxxx> wrote:
Hong,

 Probably your best chance at recovering any data without special,
 expensive, forensic procedures is to perform a dd from /dev/sdb to
 somewhere else large enough to hold a full disk image and attempt to
 repair that. You'll want to use 'conv=noerror' with your dd command
 since your disk is failing. Then you could either re-attach the OSD
 from the new source or attempt to retrieve objects from the filestore
 on it.

 I have actually done this before by creating an RBD that matches the
 disk size, performing the dd, running xfs_repair, and eventually
 adding it back to the cluster as an OSD. RBDs as OSDs is certainly a
 temporary arrangement for repair only, but I'm happy to report that it
 worked flawlessly in my case. I was able to weight the OSD to 0,
 offload all of its data, then remove it for a full recovery, at which
 point I just deleted the RBD.

 The possibilities afforded by Ceph inception are endless. ☺

 Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
 380 Data Drive Suite 300 | Draper | Utah | 84020
 Office: 801.871.2799 |

 If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited.

 On Mon, 2017-08-28 at 23:17 +0100, Tomasz Kusmierz wrote:
 > Rule of thumb with batteries is:
 > - more "proper temperature" you run them at the more life you get out
 > of them
 > - more battery is overpowered for your application the longer it will
 > survive. 
 >
 > Get your self a LSI 94** controller and use it as HBA and you will be
 > fine. but get MORE DRIVES !!!!! ... 
 > > On 28 Aug 2017, at 23:10, hjcho616 <hjcho616@xxxxxxxxx> wrote:
 > >
 > > Thank you Tomasz and Ronny.  I'll have to order some hdd soon and
 > > try these out.  Car battery idea is nice!  I may try that.. =)  Do
 > > they last longer?  Ones that fit the UPS original battery spec
 > > didn't last very long... part of the reason why I gave up on them..
 > > =P  My wife probably won't like the idea of car battery hanging out
 > > though ha!
 > >
 > > The OSD1 (one with mostly ok OSDs, except that smart failure)
 > > motherboard doesn't have any additional SATA connectors available.
 > >  Would it be safe to add another OSD host?
 > >
 > > Regards,
 > > Hong
 > >
 > >
 > >
 > > On Monday, August 28, 2017 4:43 PM, Tomasz Kusmierz <tom.kusmierz@g
 > > mail.com> wrote:
 > >
 > >
 > > Sorry for being brutal ... anyway 
 > > 1. get the battery for UPS ( a car battery will do as well, I've
 > > moded on ups in the past with truck battery and it was working like
 > > a charm :D )
 > > 2. get spare drives and put those in because your cluster CAN NOT
 > > get out of error due to lack of space
 > > 3. Follow advice of Ronny Aasen on hot to recover data from hard
 > > drives 
 > > 4 get cooling to drives or you will loose more ! 
 > >
 > >
 > > > On 28 Aug 2017, at 22:39, hjcho616 <hjcho616@xxxxxxxxx> wrote:
 > > >
 > > > Tomasz,
 > > >
 > > > Those machines are behind a surge protector.  Doesn't appear to
 > > > be a good one!  I do have a UPS... but it is my fault... no
 > > > battery.  Power was pretty reliable for a while... and UPS was
 > > > just beeping every chance it had, disrupting some sleep.. =P  So
 > > > running on surge protector only.  I am running this in home
 > > > environment.   So far, HDD failures have been very rare for this
 > > > environment. =)  It just doesn't get loaded as much!  I am not
 > > > sure what to expect, seeing that "unfound" and just a feeling of
 > > > possibility of maybe getting OSD back made me excited about it.
 > > > =) Thanks for letting me know what should be the priority.  I
 > > > just lack experience and knowledge in this. =) Please do continue
 > > > to guide me though this. 
 > > >
 > > > Thank you for the decode of that smart messages!  I do agree that
 > > > looks like it is on its way out.  I would like to know how to get
 > > > good portion of it back if possible. =)
 > > >
 > > > I think I just set the size and min_size to 1.
 > > > # ceph osd lspools
 > > > 0 data,1 metadata,2 rbd,
 > > > # ceph osd pool set rbd size 1
 > > > set pool 2 size to 1
 > > > # ceph osd pool set rbd min_size 1
 > > > set pool 2 min_size to 1
 > > >
 > > > Seems to be doing some backfilling work.
 > > >
 > > > # ceph health
 > > > HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 2
 > > > pgs backfill_toofull; 74 pgs backfill_wait; 3 pgs backfilling;
 > > > 108 pgs degraded; 6 pgs down; 6 pgs inconsistent; 6 pgs peering;
 > > > 7 pgs recovery_wait; 16 pgs stale; 108 pgs stuck degraded; 6 pgs
 > > > stuck inactive; 16 pgs stuck stale; 130 pgs stuck unclean; 101
 > > > pgs stuck undersized; 101 pgs undersized; 1 requests are blocked
 > > > > 32 sec; recovery 1790657/4502340 objects degraded (39.772%);
 > > > recovery 641906/4502340 objects misplaced (14.257%); recovery
 > > > 147/2251990 unfound (0.007%); 50 scrub errors; mds cluster is
 > > > degraded; no legacy OSD present but 'sortbitwise' flag is not set
 > > >
 > > >
 > > >
 > > > Regards,
 > > > Hong
 > > >
 > > >
 > > > On Monday, August 28, 2017 4:18 PM, Tomasz Kusmierz <tom.kusmierz
 > > > @gmail.com> wrote:
 > > >
 > > >
 > > > So to decode few things about your disk:
 > > >
 > > >   1 Raw_Read_Error_Rate    0x002f  100  100  051    Pre-fail 
 > > > Always      -      37
 > > > 37 read erros and only one sector marked as pending - fun disk
 > > > :/ 
 > > >
 > > > 181 Program_Fail_Cnt_Total  0x0022  099  099  000    Old_age 
 > > > Always      -      35325174
 > > > So firmware has quite few bugs, that's nice
 > > >
 > > > 191 G-Sense_Error_Rate      0x0022  100  100  000    Old_age 
 > > > Always      -      2855
 > > > disk was thrown around while operational even more nice.
 > > >
 > > > 194 Temperature_Celsius    0x0002  047  041  000    Old_age 
 > > > Always      -      53 (Min/Max 15/59)
 > > > if your disk passes 50 you should not consider using it, high
 > > > temperatures demagnetise plate layer and you will see more errors
 > > > in very near future.
 > > >
 > > > 197 Current_Pending_Sector  0x0032  100  100  000    Old_age 
 > > > Always      -      1
 > > > as mentioned before :)
 > > >
 > > > 200 Multi_Zone_Error_Rate  0x002a  100  100  000    Old_age 
 > > > Always      -      4222
 > > > your heads keep missing tracks ... bent ? I don't even know how to
 > > > comment here.
 > > >
 > > >
 > > > generally fun drive you've got there ... rescue as much as you can
 > > > and throw it away !!!
 > > >
 > > >
 > >
 > >
 _______________________________________________
 ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
 ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com