I configured a three-monitor Ceph cluster following the manual instructions at http://ceph.com/docs/v0.80.5/install/manual-deployment/ and http://ceph.com/docs/master/rados/operations/add-or-rm-mons/ . The monitor cluster came up without a problem, and seems to be fine. "ceph -s" currently shows this (I didn't capture what it said before I added the OSDs, but it was probably roughly the same): cluster f6c14635-1e04-497e-b782-dbba65c70257 health HEALTH_WARN 192 pgs stuck inactive; 192 pgs stuck unclean monmap e1: 3 mons at {curly=10.38.56.3:6789/0,larry=10.38.56.2:6789/0,moe=10.38.56.4:6789/0}, election epoch 10, quorum 0,1,2 larry,curly,moe osdmap e35: 15 osds: 0 up, 0 in pgmap v36: 192 pgs, 3 pools, 0 bytes data, 0 objects 0 kB used, 0 kB / 0 kB avail 192 creating So, aside from the osds, this looks fine. I then added fifteen OSD daemons, spread between two of the machines in the cluster. I again followed the instructions in the manual deployment page, which have always worked for me in the past. This time, none of the daemons are ever being marked as "up" or "in". Google isn't helping me much either. What I can determine: ps awx on the two storage machines does show that the ceph-osd processes are running (with stable pids). A sample "netstat -rn |grep 6789" looks like this: tcp 0 0 10.38.56.2:6789 0.0.0.0:* LISTEN 55585/ceph-mon tcp 0 0 10.38.56.2:40219 10.38.56.3:6789 ESTABLISHED 19081/ceph-osd tcp 0 0 10.38.56.2:6789 10.38.56.3:60891 ESTABLISHED 55585/ceph-mon tcp 0 0 10.38.56.2:60586 10.38.56.4:6789 ESTABLISHED 9830/ceph-osd tcp 0 0 10.38.56.2:6789 10.38.56.3:60856 ESTABLISHED 55585/ceph-mon tcp 0 0 10.38.56.2:60606 10.38.56.4:6789 ESTABLISHED 20424/ceph-osd tcp 0 0 10.38.56.2:40207 10.38.56.3:6789 ESTABLISHED 13247/ceph-osd tcp 0 0 10.38.56.2:54488 10.38.56.2:6789 ESTABLISHED 16445/ceph-osd tcp 0 0 10.38.56.2:60610 10.38.56.4:6789 ESTABLISHED 24939/ceph-osd tcp 0 0 10.38.56.2:6789 10.38.56.2:54488 ESTABLISHED 55585/ceph-mon tcp 0 0 10.38.56.2:60560 10.38.56.4:6789 ESTABLISHED 55585/ceph-mon tcp 0 0 10.38.56.2:40211 10.38.56.3:6789 ESTABLISHED 14662/ceph-osd The other storage machine looks roughly the same. It looks to me like the OSDs are running, and are connected to monitors. "ceph auth list" looks like this (keys blanked out): installed auth entries: osd.0 key: XXX caps: [mon] allow rwx caps: [osd] allow * osd.1 key: XXX caps: [mon] allow rwx caps: [osd] allow * osd.10 key: XXX caps: [mon] allow rwx caps: [osd] allow * osd.11 key: XXX caps: [mon] allow rwx caps: [osd] allow * osd.12 key: XXX caps: [mon] allow rwx caps: [osd] allow * osd.13 key: XXX caps: [mon] allow rwx caps: [osd] allow * osd.14 key: XXX caps: [mon] allow rwx caps: [osd] allow * osd.2 key: XXX caps: [mon] allow rwx caps: [osd] allow * osd.3 key: XXX caps: [mon] allow rwx caps: [osd] allow * osd.4 key: XXX caps: [mon] allow rwx caps: [osd] allow * osd.5 key: XXX caps: [mon] allow rwx caps: [osd] allow * osd.6 key: XXX caps: [mon] allow rwx caps: [osd] allow * osd.7 key: XXX caps: [mon] allow rwx caps: [osd] allow * osd.8 key: XXX caps: [mon] allow rwx caps: [osd] allow * osd.9 key: XXX caps: [mon] allow rwx caps: [osd] allow * client.admin key: XXX caps: [mds] allow caps: [mon] allow * caps: [osd] allow * "ceph osd tree" gives: # id weight type name up/down reweight -1 16.12 root default -2 8.663 host larry 0 0.932 osd.0 down 0 1 1.4 osd.1 down 0 2 1.4 osd.2 down 0 3 1.4 osd.3 down 0 4 0.932 osd.4 down 0 5 1.9 osd.5 down 0 6 0.699 osd.6 down 0 -3 7.456 host curly 7 0.932 osd.7 down 0 8 0.932 osd.8 down 0 9 0.932 osd.9 down 0 10 0.932 osd.10 down 0 11 0.932 osd.11 down 0 12 0.932 osd.12 down 0 13 0.932 osd.13 down 0 14 0.932 osd.14 down 0 My /var/log/ceph/osd-*.log files don't have anything in them that look like errors. They mostly end with some lines about "crush map has features..." that come after "done with init, starting boot process". On an osd that I restarted, the log just ends with the "starting boot process" line. Finally, my ceph.conf looks like this: [global] fsid = f6c14635-1e04-497e-b782-dbba65c70257 mon initial members = larry,curly,moe mon host = 10.38.56.2,10.38.56.3,10.38.56.4 public network = 10.38.56.0/24 cluster network = 10.29.38.0/24 auth cluster required = cephx auth service required = cephx auth client required = cephx osd journal size = 10000 filestore max sync interval = 5 filestore xattr use omap = false osd pool default size = 2 # Write an object n times. osd pool default min size = 2 # Allow writing n copy in a degraded state. osd pool default pg num = 500 osd pool default pgp num = 500 osd crush chooseleaf type = 1 # osd crush chooseleaf type = 0 [osd.0] public address = 10.38.56.2 cluster address = 10.29.38.2 [osd.1] public address = 10.38.56.2 cluster address = 10.29.38.2 [osd.2] public address = 10.38.56.2 cluster address = 10.29.38.2 [osd.3] public address = 10.38.56.2 cluster address = 10.29.38.2 [osd.4] public address = 10.38.56.2 cluster address = 10.29.38.2 [osd.5] public address = 10.38.56.2 cluster address = 10.29.38.2 [osd.6] public address = 10.38.56.2 cluster address = 10.29.38.2 [osd.7] public address = 10.38.56.3 cluster address = 10.29.38.3 [osd.8] public address = 10.38.56.3 cluster address = 10.29.38.3 [osd.9] public address = 10.38.56.3 cluster address = 10.29.38.3 [osd.10] public address = 10.38.56.3 cluster address = 10.29.38.3 [osd.11] public address = 10.38.56.3 cluster address = 10.29.38.3 [osd.12] public address = 10.38.56.3 cluster address = 10.29.38.3 [osd.13] public address = 10.38.56.3 cluster address = 10.29.38.3 [osd.14] public address = 10.38.56.3 cluster address = 10.29.38.3 [mds.0] host = larry [mon.curly] mon addr = 10.38.56.2 [mon.larry] mon addr = 10.38.56.3 [mon.moe] mon addr = 10.38.56.4 I added the [mon.X] lines later to see if it would do anything, and it didn't. I really have no idea what's going on here. Any advice would be appreciated. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com