Re: Power outages!!! help!

Ronny Aasen <ronny+ceph-users@xxxxxxxx> · Mon, 28 Aug 2017 12:58:12 +0200

On 28. aug. 2017 08:01, hjcho616 wrote:
Hello!

I've been using ceph for long time mostly for network CephFS storage, 
even before Argonaut release!  It's been working very well for me.  Yes, 
I had some power outtages before and asked few questions on this list 
before and got resolved happily!  Thank you all!

Not sure why but we've been having quite a bit of power outages lately. 
  Ceph appear to be running OK with those going on.. so I was pretty 
happy and didn't thought much of it... till yesterday, When I started to 
move some videos to cephfs, ceph decided that it was full although df 
showed only 54% utilization!  Then I looked up, some of the osds were 
down! (only 3 at that point!)

I am running pretty simple ceph configuration... I have one machine 
running MDS and mon named MDS1.  Two OSD machines with 5 2TB HDDs and 1 
SSD for journal named OSD1 and OSD2.

At the time, I was running jewel 10.2.2. I looked at some of downed 
OSD's log file and googled some of them... they appeared to be tied to 
version 10.2.2.  So I just upgraded all to 10.2.9.  Well that didn't 
solve my problems.. =P  While looking at some of this.. there was 
another power outage!  D'oh!  I may need to invest in a UPS or 
something... Until this happened, all of the osd down were from OSD2. 
  But OSD1 took a hit!  Couldn't boot, because osd-0 was damaged... I 
tried xfs_repair -L /dev/sdb1 as suggested by command line.. I was able 
to mount it again, phew, reboot... then /dev/sdb1 is no longer 
accessible!  Noooo!!!

So this is what I have today!  I am a bit concerned as half of the osds 
are down!  and osd.0 doesn't look good at all...
# ceph osd tree
ID WEIGHT   TYPE NAME     UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 16.24478 root default
-2  8.12239     host OSD1
  1  1.95250         osd.1      up  1.00000          1.00000
  0  1.95250         osd.0    down        0          1.00000
  7  0.31239         osd.7      up  1.00000          1.00000
  6  1.95250         osd.6      up  1.00000          1.00000
  2  1.95250         osd.2      up  1.00000          1.00000
-3  8.12239     host OSD2
  3  1.95250         osd.3    down        0          1.00000
  4  1.95250         osd.4    down        0          1.00000
  5  1.95250         osd.5    down        0          1.00000
  8  1.95250         osd.8    down        0          1.00000
  9  0.31239         osd.9      up  1.00000          1.00000

This looked alot better before that last extra power outage... =(  Can't 
mount it anymore!
# ceph health
HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 44 pgs 
backfill_toofull; 80 pgs backfill_wait; 122 pgs degraded; 6 pgs down; 8 
pgs inconsistent; 6 pgs peering; 2 pgs recovering; 18 pgs recovery_wait; 
16 pgs stale; 122 pgs stuck degraded; 6 pgs stuck inactive; 16 pgs stuck 
stale; 159 pgs stuck unclean; 102 pgs stuck undersized; 102 pgs 
undersized; 1 requests are blocked > 32 sec; recovery 1803466/4503980 
objects degraded (40.042%); recovery 692976/4503980 objects misplaced 
(15.386%); recovery 147/2251990 unfound (0.007%); 1 near full osd(s); 54 
scrub errors; mds cluster is degraded; no legacy OSD present but 
'sortbitwise' flag is not set

Each of osds are showing different failure signature.

I've uploaded osd log with debug osd = 20, debug filestore = 20, and 
debug ms = 20.  You can find it in below links.  Let me know if there is 
preferred way to share this!
https://drive.google.com/open?id=0By7YztAJNGUWQXItNzVMR281Snc (ceph-osd.3.log)
https://drive.google.com/open?id=0By7YztAJNGUWYmJBb3RvLVdSQWc (ceph-osd.4.log)
https://drive.google.com/open?id=0By7YztAJNGUWaXhRMlFOajN6M1k (ceph-osd.5.log)
https://drive.google.com/open?id=0By7YztAJNGUWdm9BWFM5a3ExOFE (ceph-osd.8.log)

So how does this look?  Can this be fixed? =)  If so please let me know. 
  I used to take backups but since it grew so big, I wasn't able to do 
so anymore... and would like to get most of these back if I can.  Please 
let me know if you need more info!

Thank you!

Regards,
Hong

with only 2 osd host. how are you doing replication ? i assume you use 
size=2, and that is somewhat ok, if you have min_size=2, but if you have 
min_size=1 it can quickly become a big problem of lost objects.

with size=2, min_size=2 your data should be on 2 drives safely(if you 
can get one of them running again), but your cluster will block when 
there is an issue.

if at all possible i would add a third osd node in your cluster. so your 
OK PG's can replicate to that and you can work on the down osd's without 
fear of loosing additional working osd's

Also some of your logs contains lines like...

failed to bind the UNIX domain socket to 
'/var/run/ceph/ceph-osd.3.asok': (17) File exists

filestore(/var/lib/ceph/osd/ceph-3) lock_fsid failed to lock 
/var/lib/ceph/osd/ceph-3/fsid, is another ceph-osd still running? (11) 
Resource temporarily unavailable

7faf16e23800 -1 osd.3 0 OSD::pre_init: object store 
'/var/lib/ceph/osd/ceph-3' is currently in use. (Is ceph-osd already 
running?)

7faf16e23800 -1  ** ERROR: osd pre_init failed: (16) Device or resource busy

This can indicate that you have a dead osd3 process keeping the 
resources open, and preventing a new osd from starting.

check with   ps aux if you can see any ceph processes. If you do find 
somthging relating to your down osds's you should try stopping it 
normally, and if that fails. killing it manually. before trying to 
restart the osd.

also check dmesg if you have messages relating to faulty hardware or OOM 
killer there. i have had experiences with the OOM killer where the osd 
node became unreliable until i rebooted the machine.

kind regards, and good luck
Ronny Aasen

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com