Firefly OSDs stuck in creating state forever

Bruce.McFarland@xxxxxxxxxxxxxxxx (Bruce McFarland) · Sun, 3 Aug 2014 17:19:02 +0000

Ignore the kdump comment the kernel didn't crash only Ceph-mon. I'll save that portion of the Ceph-mon log. 

Sent from my iPhone

> On Aug 3, 2014, at 9:58 AM, "Bruce McFarland" <Bruce.McFarland at taec.toshiba.com> wrote:
> 
> Is there a recommended way to take every thing down and restart the process? I was considering starting completely from scratch ie OS reinstall and then using Ceph-deploy as before. 
> I've learned a lot and want to figure out a fool proof way I can document for others in our lab to bring up a cluster on new HW.  I learn a lot more when I break things and have to figure out what went wrong so its a little frustrating, but I've found out a lot about verifying the configuration and debug options so far. My intent is to investigate rbd usage, perf, and configuration options. 
> 
> The "endless loop" I'm referring to is a constant stream of fault messages that I'm not yet familiar on how to interpret. I have let them run to see if the cluster recovers, but Ceph-mon always crashed. I'll look for the crash dump and save it since kdump should be enabled on the monitor box. 
> 
> Thanks for the feedback. 
> 
> 
>> On Aug 3, 2014, at 8:30 AM, "Sage Weil" <sweil at redhat.com> wrote:
>> 
>> Hi Bruce,
>> 
>>> On Sun, 3 Aug 2014, Bruce McFarland wrote:
>>> Yes I looked at tcpdump on each of the OSDs and saw communications between
>>> all 3 OSDs before I sent my first question to this list. When I disabled
>>> selinux on the one offending server based on your feedback (typically we
>>> have this disabled on lab systems that are only on the lab net) the 10 pages
>>> in my test pool all went to ?active+clean? almost immediately. Unfortunately
>>> the 3 default pools still remain in the creating states and are not
>>> health_ok. The OSDs all stayed UP/IN after the selinux change for the rest
>>> of the day until I made the mistake of creating a RBD image on demo-pool and
>>> it?s 10 ?active+clean? pages. I created the rbd, but when I attempted to
>>> look at it with ?rbd info? the cluster went into an endless loop  trying to
>>> read a placement group and loop that I left running overnight. This morning
>> 
>> What do you mean by "went into an endless loop"?
>> 
>>> ceph-mon was crashed again. I?ll probably start all over from scratch once
>>> again on Monday.
>> 
>> Was there a stack dump in the mon log?
>> 
>> It is possible that there is a bug with pool creation that surfaced 
>> by having selinux in place for so long, but otherwise this scenario 
>> doesn't make much sense to me.  :/  Very interested in hearing more, 
>> and/or whether you can reproduce it.
>> 
>> Thanks!
>> sage
>> 
>> 
>>> 
>>> 
>>> 
>>> I deleted ceph-mds and got rid of the ?laggy? comments from ?ceph health?.
>>> The ?official? online Ceph docs on that ?coming soon? and most references I
>>> could find were pre firefly so it was a little trail and error to figure out
>>> to use the pool number and not it?s name to get the removal to work. Same
>>> with ?ceph mds newfs? to get rid of ?laggy-ness? in the ?ceph health?
>>> output.
>>> 
>>> 
>>> 
>>> [root at essperf3 Ceph]# ceph mds rm 0  mds.essperf3
>>> 
>>> mds gid 0 dne
>>> 
>>> [root at essperf3 Ceph]# ceph health
>>> 
>>> HEALTH_WARN 96 pgs incomplete; 96 pgs peering; 192 pgs stuck inactive; 192
>>> pgs stuck unclean mds essperf3 is laggy
>>> 
>>> [root at essperf3 Ceph]# ceph mds newfs 1 0  --yes-i-really-mean-it
>>> 
>>> new fs with metadata pool 1 and data pool 0
>>> 
>>> [root at essperf3 Ceph]# ceph health
>>> 
>>> HEALTH_WARN 96 pgs incomplete; 96 pgs peering; 192 pgs stuck inactive; 192
>>> pgs stuck unclean
>>> 
>>> [root at essperf3 Ceph]#
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> From: Brian Rak [mailto:brak at gameservers.com]
>>> Sent: Friday, August 01, 2014 6:14 PM
>>> To: Bruce McFarland; ceph-users at lists.ceph.com
>>> Subject: Re: [ceph-users] Firefly OSDs stuck in creating state forever
>>> 
>>> 
>>> 
>>> What happens if you remove nodown?  I'd be interested to see what OSDs it
>>> thinks are down. My next thought would be tcpdump on the private interface. 
>>> See if the OSDs are actually managing to connect to each other.
>>> 
>>> For comparison, when I bring up a cluster of 3 OSDs it goes to HEALTH_OK
>>> nearly instantly (definitely under a minute!), so it's probably not just
>>> taking awhile.
>>> 
>>> Does 'ceph osd dump' show the proper public and private IPs?
>>> 
>>> On 8/1/2014 6:13 PM, Bruce McFarland wrote:
>>> 
>>>     MDS: I assumed that I?d need to bring up a ceph-mds for my
>>>     cluster at initial bringup. We also intended to modify the CRUSH
>>>     map such that it?s pool is resident to SSD(s). It is one of the
>>>     areas of the online docs there doesn?t seem to be a lot of info
>>>     on and I haven?t spent a lot of time researching. I?ll stop it.
>>> 
>>> 
>>> 
>>>     OSD connectivity:  The connectivity is good for both 1GE and
>>>     10GE. I thought moving to 10GE with nothing else on that net
>>>     might help with group placement etc and bring up the pages
>>>     quicker. I?ve checked ?tcpdump? output on all boxes.
>>> 
>>>     Firewall: Thanks for that one - it?s the ?basic? I over looked
>>>     in my ceph learning curve. One of the OSDs had selinux=enforcing
>>>     ? all others were disabled. Changing that box and the 10 pages
>>>     in my demo-pool (kept page count very small for sanity) are now
>>>     ?active+clean?. The pages for the default pools ? data,
>>>     metadata, rbd ? are still stuck in  creating+peering or
>>>     creating+incomplete. I did have to use manually set ?osd pool
>>>     default min size = 1? from it?s default of 2  for these 3 pools
>>>     to eliminate a bunch of warnings in the ?ceph health detail?
>>>     output.
>>> 
>>>     I?m adding the [mon] setting  you suggested below and stopping
>>>     ceph-mds and bringing everything up now.
>>> 
>>>     [root at essperf3 Ceph]# ceph -s
>>> 
>>>         cluster 4b3ffe60-73f4-4512-b7da-b04e4775dd73
>>> 
>>>          health HEALTH_WARN 96 pgs incomplete; 96 pgs peering; 192
>>>     pgs stuck inactive; 192 pgs stuck unclean; 28 requests are
>>>     blocked > 32 sec; nodown,noscrub flag(s) set
>>> 
>>>          monmap e1: 1 mons at {essperf3=209.243.160.35:6789/0},
>>>     election epoch 1, quorum 0 essperf3
>>> 
>>>          mdsmap e43: 1/1/1 up {0=essperf3=up:creating}
>>> 
>>>          osdmap e752: 3 osds: 3 up, 3 in
>>> 
>>>                 flags nodown,noscrub
>>> 
>>>           pgmap v1483: 202 pgs, 4 pools, 0 bytes data, 0 objects
>>> 
>>>                 134 MB used, 1158 GB / 1158 GB avail
>>> 
>>>                       96 creating+peering
>>> 
>>>                       10 active+clean
>>>     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<!!!!!!!!
>>> 
>>>                       96 creating+incomplete
>>> 
>>>     [root at essperf3 Ceph]#
>>> 
>>> 
>>> 
>>>     From: Brian Rak [mailto:brak at gameservers.com]
>>>     Sent: Friday, August 01, 2014 2:54 PM
>>>     To: Bruce McFarland; ceph-users at lists.ceph.com
>>>     Subject: Re: [ceph-users] Firefly OSDs stuck in creating state
>>>     forever
>>> 
>>> 
>>> 
>>> Why do you have a MDS active?  I'd suggest getting rid of that at
>>> least until you have everything else working.
>>> 
>>> I see you've set nodown on the OSDs, did you have problems with the
>>> OSDs flapping?  Do the OSDs have broken connectivity between
>>> themselves?  Do you have some kind of firewall interfering here?
>>> 
>>> I've seen odd issues when the OSDs have broken private networking,
>>> you'll get one OSD marking all the other ones down.  Adding this to my
>>> config helped:
>>> 
>>> [mon]
>>> mon osd min down reporters = 2
>>> 
>>> 
>>> On 8/1/2014 5:41 PM, Bruce McFarland wrote:
>>> 
>>>     Hello,
>>> 
>>>     I?ve run out of ideas and assume I?ve overlooked something
>>>     very basic. I?ve created 2 ceph clusters in the last 2
>>>     weeks with different OSD HW and private network fabrics ?
>>>     1GE and 10GE. I have never been  able to get the OSDs to
>>>     come up to the ?active+clean? state. I have followed your
>>>     online documentation and at this point the only thing I
>>>     don?t think I?ve done is modifying the CRUSH map (although
>>>     I have been looking into that). These are new clusters
>>>     with no data and only 1 HDD and 1 SSD per OSD (24 2.5Ghz
>>>     cores with 64GB RAM).
>>> 
>>> 
>>> 
>>>     Since the disks are being recycled is there something I
>>>     need to flag to let ceph just create it?s mappings, but
>>>     not scrub for data compatibility? I?ve tried setting the
>>>     noscrub flag to no effect.
>>> 
>>> 
>>> 
>>>     I also have constant OSD flapping. I?ve set nodown, but
>>>     assume that is just masking a problem that still
>>>     occurring.
>>> 
>>> 
>>> 
>>>     Besides the lack of ever reaching ?active+clean? state
>>>     ceph-mon always crashes after leaving it running
>>>     overnight. The OSDs all eventually fill /root with with
>>>     ceph logs so I regularly have to bring everything down
>>>     Delete logs and restart.
>>> 
>>> 
>>> 
>>>     I have all sorts of output from the ceph.conf; osd boot
>>>     ouput with ?debug osd -= 20? and ?debug ms = 1?; ceph ?w
>>>     output; and pretty much all of the debug/monitoring
>>>     suggestions from the online docs and 2 weeks of google
>>>     searches from online references in blogs, mailing lists
>>>     etc.
>>> 
>>> 
>>> 
>>>     [root at essperf3 Ceph]# ceph -v
>>> 
>>>     ceph version 0.80.1
>>>     (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
>>> 
>>>     [root at essperf3 Ceph]# ceph -s
>>> 
>>>         cluster 4b3ffe60-73f4-4512-b7da-b04e4775dd73
>>> 
>>>          health HEALTH_WARN 96 pgs incomplete; 106 pgs
>>>     peering; 202 pgs stuck inactive; 202 pgs stuck unclean;
>>>     nodown,noscrub flag(s) set
>>> 
>>>          monmap e1: 1 mons at
>>>     {essperf3=209.243.160.35:6789/0}, election epoch 1, quorum
>>>     0 essperf3
>>> 
>>>          mdsmap e43: 1/1/1 up {0=essperf3=up:creating}
>>> 
>>>          osdmap e752: 3 osds: 3 up, 3 in
>>> 
>>>                 flags nodown,noscrub
>>> 
>>>           pgmap v1476: 202 pgs, 4 pools, 0 bytes data, 0
>>>     objects
>>> 
>>>                 134 MB used, 1158 GB / 1158 GB avail
>>> 
>>>                      106 creating+peering
>>> 
>>>                       96 creating+incomplete
>>> 
>>>     [root at essperf3 Ceph]#
>>> 
>>> 
>>> 
>>>     Suggestions?
>>> 
>>>     Thanks,
>>> 
>>>     Bruce
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> 
>>> ceph-users mailing list
>>> 
>>> ceph-users at lists.ceph.com
>>> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> 
>>> 
>>> 
>>> 
>>>