Re: One monitor won't start after upgrade from 6.1.3 to 6.1.4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Got it going.
This helped http://tracker.ceph.com/issues/5205

My ceph.conf has cluster and public addresses defined in global. I commented them out and mon.c started successfully.

[global]
        auth cluster required = cephx
        auth service required = cephx
        auth client required = cephx
#       public network = 192.168.6.0/24
#       cluster network = 10.6.0.0/16

# ceph status
   health HEALTH_OK
   monmap e3: 3 mons at {a=192.168.6.101:6789/0,b=192.168.6.102:6789/0,c=192.168.6.103:6789/0}, election epoch 14230, quorum 0,1,2 a,b,c
   osdmap e1538: 18 osds: 17 up, 17 in
    pgmap v4064405: 5448 pgs: 5447 active+clean, 1 active+clean+scrubbing+deep; 5829 GB data, 11691 GB used, 34989 GB / 46681 GB avail; 328B/s rd, 816KB/s wr, 135op/s
   mdsmap e1: 0/0/1 up

Looks like there is a fix on the way.
Darryl

On 06/26/13 13:58, Darryl Bond wrote:
Nope, same outcome.

[root@ceph3 mon]# ceph mon remove c
removed mon.c at 192.168.6.103:6789/0, there are now 2 monitors
[root@ceph3 mon]# mkdir tmp
[root@ceph3 mon]# ceph auth get mon. -o tmp/keyring
exported keyring for mon.
[root@ceph3 mon]# ceph mon getmap -o tmp/monmap
2013-06-26 13:51:26.640097 7ffb48a12700  0 -- :/24748 >>
192.168.6.103:6789/0 pipe(0x1105350 sd=3 :0 s=1 pgs=0 cs=0 l=1).fault
got latest monmap
[root@ceph3 mon]# ls -l tmp
total 8
-rw-r--r--. 1 root root  55 Jun 26 13:51 keyring
-rw-r--r--. 1 root root 328 Jun 26 13:51 monmap
[root@ceph3 mon]# ceph-mon -i c --mkfs --monmap tmp/monmap --keyring
tmp/keyring
ceph-mon: created monfs at /var/lib/ceph/mon/ceph-c for mon.c
[root@ceph3 mon]# ls ceph-c
keyring  store.db
[root@ceph3 mon]# ceph mon add c 192.168.6.103:6789
mon c 192.168.6.103:6789/0 already exists
[root@ceph3 mon]# ceph status
2013-06-26 13:53:58.401436 7f0dd653d700  0 -- :/25695 >>
192.168.6.103:6789/0 pipe(0x108e350 sd=3 :0 s=1 pgs=0 cs=0 l=1).fault
    health HEALTH_WARN 1 mons down, quorum 0,1 a,b
    monmap e3: 3 mons at
{a=192.168.6.101:6789/0,b=192.168.6.102:6789/0,c=192.168.6.103:6789/0},
election epoch 14228, quorum 0,1 a,b
    osdmap e1342: 18 osds: 18 up, 18 in
     pgmap v4060824: 5448 pgs: 5448 active+clean; 5820 GB data, 11673 GB
used, 35464 GB / 47137 GB avail; 2983KB/s rd, 1217KB/s wr, 552op/s
    mdsmap e1: 0/0/1 up

[root@ceph3 mon]# service ceph start mon.c
=== mon.c ===
Starting Ceph mon.c on ceph3...
[25887]: (33) Numerical argument out of domain
failed: 'ulimit -n 8192;  /usr/bin/ceph-mon -i c --pid-file
/var/run/ceph/mon.c.pid -c /etc/ceph/ceph.conf '
Starting ceph-create-keys on ceph3...
[root@ceph3 mon]# ls ceph-c
keyring  store.db
[root@ceph3 mon]# ceph-mon -i c --public-addr 192.168.6.103:6789
[26768]: (33) Numerical argument out of domain

On 06/26/13 13:19, Mike Dawson wrote:
I've typically moved it off to a non-conflicting path in lieu of
deleting it outright, but either way should work. IIRC, I used something
like:

sudo mv /var/lib/ceph/mon/ceph-c /var/lib/ceph/mon/ceph-c-bak && sudo
mkdir /var/lib/ceph/mon/ceph-c

- Mike

On 6/25/2013 11:08 PM, Darryl Bond wrote:
Thanks for your prompt response.
Given that my mon.c /var/lib/ceph/mon/ceph-c is currently populated,
should I delete it's contents after removing the monitor and before
re-adding it?

Darryl

On 06/26/13 12:50, Mike Dawson wrote:
Darryl,

I've seen this issue a few times recently. I believe Joao was looking
into it at one point, but I don't know if it has been resolved (Any news
Joao?). Others have run into it too. Look closely at:

http://tracker.ceph.com/issues/4999
http://irclogs.ceph.widodh.nl/index.php?date=2013-06-07
http://irclogs.ceph.widodh.nl/index.php?date=2013-05-27
http://irclogs.ceph.widodh.nl/index.php?date=2013-05-25
http://irclogs.ceph.widodh.nl/index.php?date=2013-05-21
http://irclogs.ceph.widodh.nl/index.php?date=2013-05-15

I'd recommend you submit this as a bug on the tracker.

It sounds like you have reliable quorum between a and b, that's good.
The workaround that has worked for me is to remove mon.c, then re-add
it. Assuming your monitor leveldb stores aren't too large, the process
is rather quick. Follow the instructions at:

http://ceph.com/docs/next/rados/operations/add-or-rm-mons/#removing-monitors


then

http://ceph.com/docs/next/rados/operations/add-or-rm-mons/#adding-monitors


- Mike


On 6/25/2013 10:34 PM, Darryl Bond wrote:
Upgrading a cluster from 6.1.3 to 6.1.4  with 3 monitors. Cluster had
been successfully upgraded from bobtail to cuttlefish and then from
6.1.2 to 6.1.3. There have been no changes to ceph.conf.

Node mon.a upgrade, a,b,c monitors OK after upgrade
Node mon.b upgrade a,b monitors OK after upgrade (note that c was not
available, even though I hadn't touched it)
Node mon.c very slow to install the upgrade, RAM was tight for some
reason and mon process was using half the RAM
Node mon.c shutdown mon.c
Node mon.c performed the upgrade
Node mon.c restart ceph - mon.c will not start


service ceph start mon.c

=== mon.c ===
Starting Ceph mon.c on ceph3...
[23992]: (33) Numerical argument out of domain
failed: 'ulimit -n 8192;  /usr/bin/ceph-mon -i c --pid-file
/var/run/ceph/mon.c.pid -c /etc/ceph/ceph.conf '
Starting ceph-create-keys on ceph3...

      health HEALTH_WARN 1 mons down, quorum 0,1 a,b
      monmap e1: 3 mons at
{a=192.168.6.101:6789/0,b=192.168.6.102:6789/0,c=192.168.6.103:6789/0},
election epoch 14224, quorum 0,1 a,b
      osdmap e1342: 18 osds: 18 up, 18 in
       pgmap v4058788: 5448 pgs: 5447 active+clean, 1
active+clean+scrubbing+deep; 5820 GB data, 11673 GB used, 35464 GB /
47137 GB avail; 813B/s rd, 643KB/s wr, 69op/s
      mdsmap e1: 0/0/1 up

Set debug mon = 20
Nothing going into logs other than assertion--- begin dump of recent
events ---
        0> 2013-06-26 12:20:36.383430 7fd5e81b57c0 -1 *** Caught signal
(Aborted) **
    in thread 7fd5e81b57c0

    ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
    1: /usr/bin/ceph-mon() [0x596fe2]
    2: (()+0xf000) [0x7fd5e7820000]
    3: (gsignal()+0x35) [0x7fd5e619fba5]
    4: (abort()+0x148) [0x7fd5e61a1358]
    5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fd5e6a99e1d]
    6: (()+0x5eeb6) [0x7fd5e6a97eb6]
    7: (()+0x5eee3) [0x7fd5e6a97ee3]
    8: (()+0x5f10e) [0x7fd5e6a9810e]
    9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x40a) [0x64a6aa]
    10: /usr/bin/ceph-mon() [0x65f916]
    11: /usr/bin/ceph-mon() [0x6960e9]
    12: (pick_addresses(CephContext*)+0x8d) [0x69624d]
    13: (main()+0x1a8a) [0x49786a]
    14: (__libc_start_main()+0xf5) [0x7fd5e618ba05]
    15: /usr/bin/ceph-mon() [0x499a69]
    NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.





The contents of this electronic message and any attachments are intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in this email do not necessarily represent the views of the company.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux