I only have four because I want to remove the original one I used to create the cluster. I tried what you suggested and rebooted all my nodes but I'm still having the same problem. I'm running Emperor on Ubuntu 12.04 on all my nodes by the way. Here is what I'm seeing as I run ceph -w and reboot my original monitor.
osdmap e124: 12 osds: 12 up, 12 in
pgmap v26271: 528 pgs, 3 pools, 6979 MB data, 1883 objects
20485 MB used, 44670 GB / 44690 GB avail
528 active+clean
2014-01-01 16:21:30.807305 mon.0 [INF] pgmap v26271: 528 pgs: 528 active+clean; 6979 MB data, 20485 MB us ed, 44670 GB / 44690 GB avail
2014-01-01 16:22:06.098971 7f272d539700 0 monclient: hunting for new mon
2014-01-01 16:23:04.823206 7fe84c1bb700 0 -- :/1019476 >> 10.0.10.11:6789/0 pipe(0x7fe840009090 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fe8400092f0).fault
2014-01-01 16:23:07.821642 7fe8443f9700 0 -- :/1019476 >> 10.0.10.11:6789/0 pipe(0x7fe840004140 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7fe8400043a0).fault
^this fault error continues until the monitor comes back online.
On Wed, Jan 1, 2014 at 4:04 PM, Wolfgang Hennerbichler <wogri@xxxxxxxxx> wrote:
Matt,
first of all: four monitors is a bad idea. use an odd number for mons, e. g. three. your other problem is your configuration file. the mon_initial members and mon_host directives should include all monitor daemons. see my cluster:
mon_initial_members = node01,node02,node03
mon_host = 10.32.0.181,10.32.0.182,10.32.0.183
hth
wogri
--
http://www.wogri.at
On 01 Jan 2014, at 21:55, Matt Rabbitt <mlrabbitt@xxxxxxxxx> wrote:
> I created a cluster, four monitors, and 12 OSDs using the ceph-deploy tool. I initially created this cluster with one monitor, then added a "public network" statement in ceph.conf so that I could use ceph-deploy to add the other monitors. When I run ceph -w now everything checks out and all monitors and OSDs show up and I can read and write data to my pool. The problem is when I shut down the monitor that I initially used to configure the cluster, nothing works anymore. If I run ceph -w all I get is fault errors about that first monitor being down, and I can't read or write data even though the other three monitors are still up. What did I do wrong here? I've been looking over the documentation and I see all kinds of info about having a mon addr attribute in my config or a public ip in the [mon] section but my config doesn't have anything like that in it. Here is my complete config:
>
> [global]
> fsid = a0ab5715-f9e6-4d71-8da6-0ad976ac350c
> mon_initial_members = storage1
> mon_host = 10.0.10.11
> auth_supported = cephx
> osd_journal_size = 6144
> filestore_xattr_use_omap = true
> public network = 10.0.10.0/24
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com