Firefly OSDs stuck in creating state forever

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,
I've run out of ideas and assume I've overlooked something very basic. I've created 2 ceph clusters in the last 2 weeks with different OSD HW and private network fabrics - 1GE and 10GE. I have never been  able to get the OSDs to come up to the 'active+clean' state. I have followed your online documentation and at this point the only thing I don't think I've done is modifying the CRUSH map (although I have been looking into that). These are new clusters with no data and only 1 HDD and 1 SSD per OSD (24 2.5Ghz cores with 64GB RAM).

Since the disks are being recycled is there something I need to flag to let ceph just create it's mappings, but not scrub for data compatibility? I've tried setting the noscrub flag to no effect.

I also have constant OSD flapping. I've set nodown, but assume that is just masking a problem that still occurring.

Besides the lack of ever reaching 'active+clean' state ceph-mon always crashes after leaving it running overnight. The OSDs all eventually fill /root with with ceph logs so I regularly have to bring everything down Delete logs and restart.

I have all sorts of output from the ceph.conf; osd boot ouput with 'debug osd -= 20' and 'debug ms = 1'; ceph -w output; and pretty much all of the debug/monitoring suggestions from the online docs and 2 weeks of google searches from online references in blogs, mailing lists etc.

[root at essperf3 Ceph]# ceph -v
ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
[root at essperf3 Ceph]# ceph -s
    cluster 4b3ffe60-73f4-4512-b7da-b04e4775dd73
     health HEALTH_WARN 96 pgs incomplete; 106 pgs peering; 202 pgs stuck inactive; 202 pgs stuck unclean; nodown,noscrub flag(s) set
     monmap e1: 1 mons at {essperf3=209.243.160.35:6789/0}, election epoch 1, quorum 0 essperf3
     mdsmap e43: 1/1/1 up {0=essperf3=up:creating}
     osdmap e752: 3 osds: 3 up, 3 in
            flags nodown,noscrub
      pgmap v1476: 202 pgs, 4 pools, 0 bytes data, 0 objects
            134 MB used, 1158 GB / 1158 GB avail
                 106 creating+peering
                  96 creating+incomplete
[root at essperf3 Ceph]#

Suggestions?
Thanks,
Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140801/d983fb3b/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux