Matt, what does 'ceph mon stat' say when your cluster is healthy and what does it say when it's unhealty? Again my example: # ceph mon stat e3: 3 mons at {node01=10.32.0.181:6789/0,node02=10.32.0.182:6789/0,node03=10.32.0.183:6789/0}, election epoch 14, quorum 0,1,2 node01,node02,node03 Wolfgang On 01/01/2014 10:29 PM, Matt Rabbitt wrote: > I only have four because I want to remove the original one I used to > create the cluster. I tried what you suggested and rebooted all my > nodes but I'm still having the same problem. I'm running Emperor on > Ubuntu 12.04 on all my nodes by the way. Here is what I'm seeing as I > run ceph -w and reboot my original monitor. > > osdmap e124: 12 osds: 12 up, 12 in > pgmap v26271: 528 pgs, 3 pools, 6979 MB data, 1883 objects > 20485 MB used, 44670 GB / 44690 GB avail > 528 active+clean > > > 2014-01-01 16:21:30.807305 mon.0 [INF] pgmap v26271: 528 pgs: 528 > active+clean; 6979 MB data, 20485 MB us > ed, 44670 GB / 44690 GB avail > 2014-01-01 16:22:06.098971 7f272d539700 0 monclient: hunting for new mon > 2014-01-01 16:23:04.823206 7fe84c1bb700 0 -- :/1019476 >> > 10.0.10.11:6789/0 <http://10.0.10.11:6789/0> pipe(0x7fe840009090 sd=3 > :0 s=1 pgs=0 cs=0 l=1 > c=0x7fe8400092f0).fault > 2014-01-01 16:23:07.821642 7fe8443f9700 0 -- :/1019476 >> > 10.0.10.11:6789/0 <http://10.0.10.11:6789/0> pipe(0x7fe840004140 sd=3 > :0 s=1 pgs=0 cs=0 l=1 > c=0x7fe8400043a0).fault > > ^this fault error continues until the monitor comes back online. > > > > On Wed, Jan 1, 2014 at 4:04 PM, Wolfgang Hennerbichler <wogri@xxxxxxxxx > <mailto:wogri@xxxxxxxxx>> wrote: > > Matt, > > first of all: four monitors is a bad idea. use an odd number for > mons, e. g. three. your other problem is your configuration file. > the mon_initial members and mon_host directives should include all > monitor daemons. see my cluster: > > mon_initial_members = node01,node02,node03 > mon_host = 10.32.0.181,10.32.0.182,10.32.0.183 > > hth > wogri > -- > http://www.wogri.at > > On 01 Jan 2014, at 21:55, Matt Rabbitt <mlrabbitt@xxxxxxxxx > <mailto:mlrabbitt@xxxxxxxxx>> wrote: > > > I created a cluster, four monitors, and 12 OSDs using the > ceph-deploy tool. I initially created this cluster with one > monitor, then added a "public network" statement in ceph.conf so > that I could use ceph-deploy to add the other monitors. When I run > ceph -w now everything checks out and all monitors and OSDs show up > and I can read and write data to my pool. The problem is when I > shut down the monitor that I initially used to configure the > cluster, nothing works anymore. If I run ceph -w all I get is fault > errors about that first monitor being down, and I can't read or > write data even though the other three monitors are still up. What > did I do wrong here? I've been looking over the documentation and I > see all kinds of info about having a mon addr attribute in my config > or a public ip in the [mon] section but my config doesn't have > anything like that in it. Here is my complete config: > > > > [global] > > fsid = a0ab5715-f9e6-4d71-8da6-0ad976ac350c > > mon_initial_members = storage1 > > mon_host = 10.0.10.11 > > auth_supported = cephx > > osd_journal_size = 6144 > > filestore_xattr_use_omap = true > > public network = 10.0.10.0/24 <http://10.0.10.0/24> > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- http://www.wogri.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com