Re: Monitor configuration issue

Wido den Hollander <wido@xxxxxxxx> · Thu, 02 Jan 2014 15:36:07 +0100

On 01/02/2014 03:19 PM, Matt Rabbitt wrote:
Healthy:

e4: 4 mons at
{storage1=10.0.10.11:6789/0,storage2=10.0.10.12:6789/0,storage3=10.0.10.13:6789/0,storage4=10.0.10.14:6789/0
<http://10.0.10.11:6789/0,storage2=10.0.10.12:6789/0,storage3=10.0.10.13:6789/0,storage4=10.0.10.14:6789/0>},
election epoch 54, quorum 0,1,2,3 storage1,storage2,storage3,storage4

You should always have a odd number of monitors. Better to have 3 then 4 
monitors. This is due to the election process the monitors use (Paxos).

So remove one monitor completely from the cluster or add a 5th one.

Wido

After storage1 goes down I get this over and over again:

014-01-02 09:16:23.789271 7fbbc82f3700  0 -- :/1000673 >>
10.0.10.11:6789/0 <http://10.0.10.11:6789/0> pipe(0x7fbbbc003b10 sd=3 :0
s=1 pgs=0 cs=0 l=1 c=0x7fbbbc003d70).fault

I'm issuing these commands from an admin node that isn't running any
monitors or OSDs but the output is the same when I run them on any of
the monitors.

On Thu, Jan 2, 2014 at 2:46 AM, Wolfgang Hennerbichler <wogri@xxxxxxxxx
<mailto:wogri@xxxxxxxxx>> wrote:

    Matt,
    what does 'ceph mon stat' say when your cluster is healthy and what does
    it say when it's unhealty?

    Again my example:

    # ceph mon stat
    e3: 3 mons at
    {node01=10.32.0.181:6789/0,node02=10.32.0.182:6789/0,node03=10.32.0.183:6789/0
    <http://10.32.0.181:6789/0,node02=10.32.0.182:6789/0,node03=10.32.0.183:6789/0>},
    election epoch 14, quorum 0,1,2 node01,node02,node03

    Wolfgang

    On 01/01/2014 10:29 PM, Matt Rabbitt wrote:
     > I only have four because I want to remove the original one I used to
     > create the cluster.  I tried what you suggested and rebooted all my
     > nodes but I'm still having the same problem.  I'm running Emperor on
     > Ubuntu 12.04 on all my nodes by the way.  Here is what I'm seeing
    as I
     > run ceph -w and reboot my original monitor.
     >
     >      osdmap e124: 12 osds: 12 up, 12 in
     >       pgmap v26271: 528 pgs, 3 pools, 6979 MB data, 1883 objects
     >             20485 MB used, 44670 GB / 44690 GB avail
     >                  528 active+clean
     >
     >
     > 2014-01-01 16:21:30.807305 mon.0 [INF] pgmap v26271: 528 pgs: 528
     > active+clean; 6979 MB data, 20485 MB us
     >                     ed, 44670 GB / 44690 GB avail
     > 2014-01-01 16:22:06.098971 7f272d539700  0 monclient: hunting for
    new mon
     > 2014-01-01 16:23:04.823206 7fe84c1bb700  0 -- :/1019476 >>
     > 10.0.10.11:6789/0 <http://10.0.10.11:6789/0>
    <http://10.0.10.11:6789/0> pipe(0x7fe840009090 sd=3
     >                                                    :0 s=1 pgs=0
    cs=0 l=1
     > c=0x7fe8400092f0).fault
     > 2014-01-01 16:23:07.821642 7fe8443f9700  0 -- :/1019476 >>
     > 10.0.10.11:6789/0 <http://10.0.10.11:6789/0>
    <http://10.0.10.11:6789/0> pipe(0x7fe840004140 sd=3
     >                                                    :0 s=1 pgs=0
    cs=0 l=1
     > c=0x7fe8400043a0).fault
     >
     > ^this fault error continues until the monitor comes back online.
     >
     >
     >
     > On Wed, Jan 1, 2014 at 4:04 PM, Wolfgang Hennerbichler
    <wogri@xxxxxxxxx <mailto:wogri@xxxxxxxxx>
     > <mailto:wogri@xxxxxxxxx <mailto:wogri@xxxxxxxxx>>> wrote:
     >
     >     Matt,
     >
     >     first of all: four monitors is a bad idea. use an odd number for
     >     mons, e. g. three. your other problem is your configuration file.
     >     the mon_initial members and mon_host directives should
    include all
     >     monitor daemons. see my cluster:
     >
     >     mon_initial_members = node01,node02,node03
     >     mon_host = 10.32.0.181,10.32.0.182,10.32.0.183
     >
     >     hth
     >     wogri
     >     --
     > http://www.wogri.at
     >
     >     On 01 Jan 2014, at 21:55, Matt Rabbitt <mlrabbitt@xxxxxxxxx
    <mailto:mlrabbitt@xxxxxxxxx>
     >     <mailto:mlrabbitt@xxxxxxxxx <mailto:mlrabbitt@xxxxxxxxx>>> wrote:
     >
     >     > I created a cluster, four monitors, and 12 OSDs using the
     >     ceph-deploy tool.  I initially created this cluster with one
     >     monitor, then added a "public network" statement in ceph.conf so
     >     that I could use ceph-deploy to add the other monitors.  When
    I run
     >     ceph -w now everything checks out and all monitors and OSDs
    show up
     >     and I can read and write data to my pool.  The problem is when I
     >     shut down the monitor that I initially used to configure the
     >     cluster, nothing works anymore.  If I run ceph -w all I get
    is fault
     >     errors about that first monitor being down, and I can't read or
     >     write data even though the other three monitors are still up.
      What
     >     did I do wrong here?  I've been looking over the
    documentation and I
     >     see all kinds of info about having a mon addr attribute in my
    config
     >     or a public ip in the [mon] section but my config doesn't have
     >     anything like that in it.  Here is my complete config:
     >     >
     >     > [global]
     >     > fsid = a0ab5715-f9e6-4d71-8da6-0ad976ac350c
     >     > mon_initial_members = storage1
     >     > mon_host = 10.0.10.11
     >     > auth_supported = cephx
     >     > osd_journal_size = 6144
     >     > filestore_xattr_use_omap = true
     >     > public network = 10.0.10.0/24 <http://10.0.10.0/24>
    <http://10.0.10.0/24>
     >     > _______________________________________________
     >     > ceph-users mailing list
     >     > ceph-users@xxxxxxxxxxxxxx
    <mailto:ceph-users@xxxxxxxxxxxxxx> <mailto:ceph-users@xxxxxxxxxxxxxx
    <mailto:ceph-users@xxxxxxxxxxxxxx>>
     >     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
     >
     >

    --
    http://www.wogri.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com