Re: Monitor configuration issue

Joao Eduardo Luis <joao.luis@xxxxxxxxxxx> · Thu, 02 Jan 2014 13:09:24 +0000

On 01/01/2014 09:29 PM, Matt Rabbitt wrote:
I only have four because I want to remove the original one I used to
create the cluster.  I tried what you suggested and rebooted all my
nodes but I'm still having the same problem.  I'm running Emperor on
Ubuntu 12.04 on all my nodes by the way.  Here is what I'm seeing as I
run ceph -w and reboot my original monitor.

      osdmap e124: 12 osds: 12 up, 12 in
       pgmap v26271: 528 pgs, 3 pools, 6979 MB data, 1883 objects
             20485 MB used, 44670 GB / 44690 GB avail
                  528 active+clean

2014-01-01 16:21:30.807305 mon.0 [INF] pgmap v26271: 528 pgs: 528
active+clean; 6979 MB data, 20485 MB us
                     ed, 44670 GB / 44690 GB avail
2014-01-01 16:22:06.098971 7f272d539700  0 monclient: hunting for new mon
2014-01-01 16:23:04.823206 7fe84c1bb700  0 -- :/1019476 >>
10.0.10.11:6789/0 <http://10.0.10.11:6789/0> pipe(0x7fe840009090 sd=3
                                                    :0 s=1 pgs=0 cs=0
l=1 c=0x7fe8400092f0).fault
2014-01-01 16:23:07.821642 7fe8443f9700  0 -- :/1019476 >>
10.0.10.11:6789/0 <http://10.0.10.11:6789/0> pipe(0x7fe840004140 sd=3
                                                    :0 s=1 pgs=0 cs=0
l=1 c=0x7fe8400043a0).fault

^this fault error continues until the monitor comes back online.

My best guess here is that your ceph.conf on the admin node (or wherever 
you're running that 'ceph -w' command) only contains mon.0's host/ip.

Those fault messages are shown whenever something (a client, a daemon) 
tries to connect to the monitor and fails.  What *should* hapeen if you 
have the remaining monitors up and your ceph.conf well configured is 
that (in this case) the client will try the another monitor in the list. 
 So you'd see that fault message when trying to connect to your down 
monitor, and it would eventually succeed as soon as it is able to 
connect to another monitor.

If you want to check if this is actually what is happening, try running 
the command with '--debug-monc 10'.  It should tell you which monitor it 
is attempting to connect to.  Also, check your client's ceph.conf.

Let us know how it goes.

  -Joao

On Wed, Jan 1, 2014 at 4:04 PM, Wolfgang Hennerbichler <wogri@xxxxxxxxx
<mailto:wogri@xxxxxxxxx>> wrote:

    Matt,

    first of all: four monitors is a bad idea. use an odd number for
    mons, e. g. three. your other problem is your configuration file.
    the mon_initial members and mon_host directives should include all
    monitor daemons. see my cluster:

    mon_initial_members = node01,node02,node03
    mon_host = 10.32.0.181,10.32.0.182,10.32.0.183

    hth
    wogri
    --
    http://www.wogri.at

    On 01 Jan 2014, at 21:55, Matt Rabbitt <mlrabbitt@xxxxxxxxx
    <mailto:mlrabbitt@xxxxxxxxx>> wrote:

     > I created a cluster, four monitors, and 12 OSDs using the
    ceph-deploy tool.  I initially created this cluster with one
    monitor, then added a "public network" statement in ceph.conf so
    that I could use ceph-deploy to add the other monitors.  When I run
    ceph -w now everything checks out and all monitors and OSDs show up
    and I can read and write data to my pool.  The problem is when I
    shut down the monitor that I initially used to configure the
    cluster, nothing works anymore.  If I run ceph -w all I get is fault
    errors about that first monitor being down, and I can't read or
    write data even though the other three monitors are still up.  What
    did I do wrong here?  I've been looking over the documentation and I
    see all kinds of info about having a mon addr attribute in my config
    or a public ip in the [mon] section but my config doesn't have
    anything like that in it.  Here is my complete config:
     >
     > [global]
     > fsid = a0ab5715-f9e6-4d71-8da6-0ad976ac350c
     > mon_initial_members = storage1
     > mon_host = 10.0.10.11
     > auth_supported = cephx
     > osd_journal_size = 6144
     > filestore_xattr_use_omap = true
     > public network = 10.0.10.0/24 <http://10.0.10.0/24>
     > _______________________________________________
     > ceph-users mailing list
     > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com