Firefly OSDs stuck in creating state forever

Bruce.McFarland@xxxxxxxxxxxxxxxx (Bruce McFarland) · Mon, 4 Aug 2014 17:04:41 +0000

2014-08-04 09:57:37.144649 7f42171c8700  0 -- 209.243.160.35:0/1032499 >> 209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204001a90).fault
2014-08-04 09:58:07.145097 7f4215ac3700  0 -- 209.243.160.35:0/1032499 >> 209.243.160.35:6789/0 pipe(0x7f4204001530 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204001320).fault
2014-08-04 09:58:37.145491 7f42171c8700  0 -- 209.243.160.35:0/1032499 >> 209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204003eb0).fault
2014-08-04 09:59:07.145776 7f4215ac3700  0 -- 209.243.160.35:0/1032499 >> 209.243.160.35:6789/0 pipe(0x7f4204001530 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204001320).fault
2014-08-04 09:59:37.146043 7f42171c8700  0 -- 209.243.160.35:0/1032499 >> 209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204003eb0).fault
2014-08-04 10:00:07.146288 7f4215ac3700  0 -- 209.243.160.35:0/1032499 >> 209.243.160.35:6789/0 pipe(0x7f4204001530 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204001320).fault
2014-08-04 10:00:37.146543 7f42171c8700  0 -- 209.243.160.35:0/1032499 >> 209.243.160.35:6789/0 pipe(0x7f4204007dd0 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f4204003eb0).fault

209.243.160.35 - monitor
209.243.160.51 - osd.0
209.243.160.52 - osd.3
209.243.160.59 - osd.2

-----Original Message-----
From: Sage Weil [mailto:sweil@xxxxxxxxxx] 
Sent: Sunday, August 03, 2014 11:15 AM
To: Bruce McFarland
Cc: Brian Rak; ceph-users at lists.ceph.com
Subject: Re: Firefly OSDs stuck in creating state forever

On Sun, 3 Aug 2014, Bruce McFarland wrote:
> Is there a recommended way to take every thing down and restart the 
> process? I was considering starting completely from scratch ie OS 
> reinstall and then using Ceph-deploy as before.

If you're using ceph-deploy, then

 ceph-deploy purge HOST
 ceph-deploy purgedata HOST

will do it.  Then remove the ceph.* (config and keyring) files from the current directory.

> I've learned a lot and want to figure out a fool proof way I can 
> document for others in our lab to bring up a cluster on new HW.  I 
> learn a lot more when I break things and have to figure out what went 
> wrong so its a little frustrating, but I've found out a lot about 
> verifying the configuration and debug options so far. My intent is to 
> investigate rbd usage, perf, and configuration options.
> 
> The "endless loop" I'm referring to is a constant stream of fault 
> messages that I'm not yet familiar on how to interpret. I have let 
> them run to see if the cluster recovers, but Ceph-mon always crashed. 
> I'll look for the crash dump and save it since kdump should be enabled 
> on the monitor box.

Do you have one of the messages handy?  I'm curious whether it is an OSD or a mon.

Thanks!
sage

> Thanks for the feedback. 
> 
> 
> > On Aug 3, 2014, at 8:30 AM, "Sage Weil" <sweil at redhat.com> wrote:
> > 
> > Hi Bruce,
> > 
> >> On Sun, 3 Aug 2014, Bruce McFarland wrote:
> >> Yes I looked at tcpdump on each of the OSDs and saw communications 
> >> between all 3 OSDs before I sent my first question to this list. 
> >> When I disabled selinux on the one offending server based on your 
> >> feedback (typically we have this disabled on lab systems that are 
> >> only on the lab net) the 10 pages in my test pool all went to 
> >> ?active+clean? almost immediately. Unfortunately the 3 default 
> >> pools still remain in the creating states and are not health_ok. 
> >> The OSDs all stayed UP/IN after the selinux change for the rest of 
> >> the day until I made the mistake of creating a RBD image on 
> >> demo-pool and it?s 10 ?active+clean? pages. I created the rbd, but 
> >> when I attempted to look at it with ?rbd info? the cluster went 
> >> into an endless loop  trying to read a placement group and loop 
> >> that I left running overnight. This morning
> > 
> > What do you mean by "went into an endless loop"?
> > 
> >> ceph-mon was crashed again. I?ll probably start all over from 
> >> scratch once again on Monday.
> > 
> > Was there a stack dump in the mon log?
> > 
> > It is possible that there is a bug with pool creation that surfaced 
> > by having selinux in place for so long, but otherwise this scenario 
> > doesn't make much sense to me.  :/  Very interested in hearing more, 
> > and/or whether you can reproduce it.
> > 
> > Thanks!
> > sage
> > 
> > 
> >> 
> >>  
> >> 
> >> I deleted ceph-mds and got rid of the ?laggy? comments from ?ceph health?.
> >> The ?official? online Ceph docs on that ?coming soon? and most 
> >> references I could find were pre firefly so it was a little trail 
> >> and error to figure out to use the pool number and not it?s name to 
> >> get the removal to work. Same with ?ceph mds newfs? to get rid of ?laggy-ness? in the ?ceph health?
> >> output.
> >> 
> >>  
> >> 
> >> [root at essperf3 Ceph]# ceph mds rm 0  mds.essperf3
> >> 
> >> mds gid 0 dne
> >> 
> >> [root at essperf3 Ceph]# ceph health
> >> 
> >> HEALTH_WARN 96 pgs incomplete; 96 pgs peering; 192 pgs stuck 
> >> inactive; 192 pgs stuck unclean mds essperf3 is laggy
> >> 
> >> [root at essperf3 Ceph]# ceph mds newfs 1 0  --yes-i-really-mean-it
> >> 
> >> new fs with metadata pool 1 and data pool 0
> >> 
> >> [root at essperf3 Ceph]# ceph health
> >> 
> >> HEALTH_WARN 96 pgs incomplete; 96 pgs peering; 192 pgs stuck 
> >> inactive; 192 pgs stuck unclean
> >> 
> >> [root at essperf3 Ceph]#
> >> 
> >>  
> >> 
> >>  
> >> 
> >>  
> >> 
> >> From: Brian Rak [mailto:brak at gameservers.com]
> >> Sent: Friday, August 01, 2014 6:14 PM
> >> To: Bruce McFarland; ceph-users at lists.ceph.com
> >> Subject: Re: [ceph-users] Firefly OSDs stuck in creating state 
> >> forever
> >> 
> >>  
> >> 
> >> What happens if you remove nodown?  I'd be interested to see what 
> >> OSDs it thinks are down. My next thought would be tcpdump on the private interface.
> >> See if the OSDs are actually managing to connect to each other.
> >> 
> >> For comparison, when I bring up a cluster of 3 OSDs it goes to 
> >> HEALTH_OK nearly instantly (definitely under a minute!), so it's 
> >> probably not just taking awhile.
> >> 
> >> Does 'ceph osd dump' show the proper public and private IPs?
> >> 
> >> On 8/1/2014 6:13 PM, Bruce McFarland wrote:
> >> 
> >>      MDS: I assumed that I?d need to bring up a ceph-mds for my
> >>      cluster at initial bringup. We also intended to modify the CRUSH
> >>      map such that it?s pool is resident to SSD(s). It is one of the
> >>      areas of the online docs there doesn?t seem to be a lot of info
> >>      on and I haven?t spent a lot of time researching. I?ll stop it.
> >> 
> >>       
> >> 
> >>      OSD connectivity:  The connectivity is good for both 1GE and
> >>      10GE. I thought moving to 10GE with nothing else on that net
> >>      might help with group placement etc and bring up the pages
> >>      quicker. I?ve checked ?tcpdump? output on all boxes.
> >> 
> >>      Firewall: Thanks for that one - it?s the ?basic? I over looked
> >>      in my ceph learning curve. One of the OSDs had selinux=enforcing
> >>      ? all others were disabled. Changing that box and the 10 pages
> >>      in my demo-pool (kept page count very small for sanity) are now
> >>      ?active+clean?. The pages for the default pools ? data,
> >>      metadata, rbd ? are still stuck in  creating+peering or
> >>      creating+incomplete. I did have to use manually set ?osd pool
> >>      default min size = 1? from it?s default of 2  for these 3 pools
> >>      to eliminate a bunch of warnings in the ?ceph health detail?
> >>      output.
> >> 
> >>      I?m adding the [mon] setting  you suggested below and stopping
> >>      ceph-mds and bringing everything up now.
> >> 
> >>      [root at essperf3 Ceph]# ceph -s
> >> 
> >>          cluster 4b3ffe60-73f4-4512-b7da-b04e4775dd73
> >> 
> >>           health HEALTH_WARN 96 pgs incomplete; 96 pgs peering; 192
> >>      pgs stuck inactive; 192 pgs stuck unclean; 28 requests are
> >>      blocked > 32 sec; nodown,noscrub flag(s) set
> >> 
> >>           monmap e1: 1 mons at {essperf3=209.243.160.35:6789/0},
> >>      election epoch 1, quorum 0 essperf3
> >> 
> >>           mdsmap e43: 1/1/1 up {0=essperf3=up:creating}
> >> 
> >>           osdmap e752: 3 osds: 3 up, 3 in
> >> 
> >>                  flags nodown,noscrub
> >> 
> >>            pgmap v1483: 202 pgs, 4 pools, 0 bytes data, 0 objects
> >> 
> >>                  134 MB used, 1158 GB / 1158 GB avail
> >> 
> >>                        96 creating+peering
> >> 
> >>                        10 active+clean
> >>      <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<!!!!!!!!
> >> 
> >>                        96 creating+incomplete
> >> 
> >>      [root at essperf3 Ceph]#
> >> 
> >>       
> >> 
> >>      From: Brian Rak [mailto:brak at gameservers.com]
> >>      Sent: Friday, August 01, 2014 2:54 PM
> >>      To: Bruce McFarland; ceph-users at lists.ceph.com
> >>      Subject: Re: [ceph-users] Firefly OSDs stuck in creating state
> >>      forever
> >> 
> >>  
> >> 
> >> Why do you have a MDS active?  I'd suggest getting rid of that at 
> >> least until you have everything else working.
> >> 
> >> I see you've set nodown on the OSDs, did you have problems with the 
> >> OSDs flapping?  Do the OSDs have broken connectivity between 
> >> themselves?  Do you have some kind of firewall interfering here?
> >> 
> >> I've seen odd issues when the OSDs have broken private networking, 
> >> you'll get one OSD marking all the other ones down.  Adding this to 
> >> my config helped:
> >> 
> >> [mon]
> >> mon osd min down reporters = 2
> >> 
> >> 
> >> On 8/1/2014 5:41 PM, Bruce McFarland wrote:
> >> 
> >>      Hello,
> >> 
> >>      I?ve run out of ideas and assume I?ve overlooked something
> >>      very basic. I?ve created 2 ceph clusters in the last 2
> >>      weeks with different OSD HW and private network fabrics ?
> >>      1GE and 10GE. I have never been  able to get the OSDs to
> >>      come up to the ?active+clean? state. I have followed your
> >>      online documentation and at this point the only thing I
> >>      don?t think I?ve done is modifying the CRUSH map (although
> >>      I have been looking into that). These are new clusters
> >>      with no data and only 1 HDD and 1 SSD per OSD (24 2.5Ghz
> >>      cores with 64GB RAM).
> >> 
> >>       
> >> 
> >>      Since the disks are being recycled is there something I
> >>      need to flag to let ceph just create it?s mappings, but
> >>      not scrub for data compatibility? I?ve tried setting the
> >>      noscrub flag to no effect.
> >> 
> >>       
> >> 
> >>      I also have constant OSD flapping. I?ve set nodown, but
> >>      assume that is just masking a problem that still
> >>      occurring.
> >> 
> >>       
> >> 
> >>      Besides the lack of ever reaching ?active+clean? state
> >>      ceph-mon always crashes after leaving it running
> >>      overnight. The OSDs all eventually fill /root with with
> >>      ceph logs so I regularly have to bring everything down
> >>      Delete logs and restart.
> >> 
> >>       
> >> 
> >>      I have all sorts of output from the ceph.conf; osd boot
> >>      ouput with ?debug osd -= 20? and ?debug ms = 1?; ceph ?w
> >>      output; and pretty much all of the debug/monitoring
> >>      suggestions from the online docs and 2 weeks of google
> >>      searches from online references in blogs, mailing lists
> >>      etc.
> >> 
> >>       
> >> 
> >>      [root at essperf3 Ceph]# ceph -v
> >> 
> >>      ceph version 0.80.1
> >>      (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
> >> 
> >>      [root at essperf3 Ceph]# ceph -s
> >> 
> >>          cluster 4b3ffe60-73f4-4512-b7da-b04e4775dd73
> >> 
> >>           health HEALTH_WARN 96 pgs incomplete; 106 pgs
> >>      peering; 202 pgs stuck inactive; 202 pgs stuck unclean;
> >>      nodown,noscrub flag(s) set
> >> 
> >>           monmap e1: 1 mons at
> >>      {essperf3=209.243.160.35:6789/0}, election epoch 1, quorum
> >>      0 essperf3
> >> 
> >>           mdsmap e43: 1/1/1 up {0=essperf3=up:creating}
> >> 
> >>           osdmap e752: 3 osds: 3 up, 3 in
> >> 
> >>                  flags nodown,noscrub
> >> 
> >>            pgmap v1476: 202 pgs, 4 pools, 0 bytes data, 0
> >>      objects
> >> 
> >>                  134 MB used, 1158 GB / 1158 GB avail
> >> 
> >>                       106 creating+peering
> >> 
> >>                        96 creating+incomplete
> >> 
> >>      [root at essperf3 Ceph]#
> >> 
> >>       
> >> 
> >>      Suggestions?
> >> 
> >>      Thanks,
> >> 
> >>      Bruce
> >> 
> >> 
> >> 
> >> 
> >> 
> >> _______________________________________________
> >> 
> >> ceph-users mailing list
> >> 
> >> ceph-users at lists.ceph.com
> >> 
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> 
> >>  
> >> 
> >>  
> >> 
> >> 
> 
>