Re: Can't start ceph mon

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 19, 2012 at 1:08 PM, Dave Humphreys (Datatone)
<dave@xxxxxxxxxxxxxx> wrote:
>
> I have a problem in which I can't start my ceph monitor. The log is shown below.
>
> The log shows version 0.54. I was running 0.52 when the problem arose, and I moved to the latest in case the newer version fixed the problem.
>
> The original failure happened a week or so ago, and could have been as a result of running out of disk space when the ceph monitor log became huge.

That is almost certainly the case, although I thought we were handling
this issue better now.

> What should I do to recover the situation?

Do you have other monitors in working order? The easiest way to handle
it if that's the case is just to remove this monitor from the cluster
and add it back in as a new monitor with a fresh store. If not we can
look into reconstructing it.
-Greg

>
>
> David
>
>
>
>
>
> 2012-11-19 20:38:51.598468 7fc13fdc6780  0 ceph version 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150), process ceph-mon, pid 21012
> 2012-11-19 20:38:51.598482 7fc13fdc6780  1 store(/ceph/mon.vault01) mount
> 2012-11-19 20:38:51.598527 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 21
> 2012-11-19 20:38:51.598542 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl magic = 21 bytes
> 2012-11-19 20:38:51.598562 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 75
> 2012-11-19 20:38:51.598567 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl feature_set = 75 bytes
> 2012-11-19 20:38:51.598582 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 205
> 2012-11-19 20:38:51.598586 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl monmap/latest = 205 bytes
> 2012-11-19 20:38:51.598809 7fc13fdc6780  1 -- 10.0.1.1:6789/0 learned my addr 10.0.1.1:6789/0
> 2012-11-19 20:38:51.598818 7fc13fdc6780  1 accepter.accepter.bind my_inst.addr is 10.0.1.1:6789/0 need_addr=0
> 2012-11-19 20:38:51.599498 7fc13fdc6780  1 -- 10.0.1.1:6789/0 messenger.start
> 2012-11-19 20:38:51.599508 7fc13fdc6780  1 accepter.accepter.start
> 2012-11-19 20:38:51.599610 7fc13fdc6780  1 mon.vault01@-1(probing) e1 init fsid 4d7d8d20-338c-4bdc-9918-9bcf04f9a832
> 2012-11-19 20:38:51.599674 7fc13cdbe700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213c6c0 sd=14 :6789 pgs=0 cs=0 l=0).accept sd=14
> 2012-11-19 20:38:51.599678 7fc141eff700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213c240 sd=9 :6789 pgs=0 cs=0 l=0).accept sd=9
> 2012-11-19 20:38:51.599718 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 37
> 2012-11-19 20:38:51.599723 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl cluster_uuid = 37 bytes
> 2012-11-19 20:38:51.599718 7fc13ccbd700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213c480 sd=19 :6789 pgs=0 cs=0 l=0).accept sd=19
> 2012-11-19 20:38:51.599729 7fc13fdc6780 10 mon.vault01@-1(probing) e1 check_fsid cluster_uuid contains '4d7d8d20-338c-4bdc-9918-9bcf04f9a832'
> 2012-11-19 20:38:51.599739 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 75
> 2012-11-19 20:38:51.599745 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl feature_set = 75 bytes
> 2012-11-19 20:38:51.599751 7fc13fdc6780 10 mon.vault01@-1(probing) e1 features compat={},rocompat={},incompat={1=initial feature set (~v.18)}
> 2012-11-19 20:38:51.599759 7fc13fdc6780 15 store(/ceph/mon.vault01) exists_bl joined
> 2012-11-19 20:38:51.599769 7fc13fdc6780 10 mon.vault01@-1(probing) e1 has_ever_joined = 1
> 2012-11-19 20:38:51.599794 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int pgmap/last_committed = 133333
> 2012-11-19 20:38:51.599801 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int pgmap/first_committed = 132833
> 2012-11-19 20:38:51.599810 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 239840
> 2012-11-19 20:38:51.599928 7fc13cbbc700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213cd80 sd=20 :6789 pgs=0 cs=0 l=0).accept sd=20
> 2012-11-19 20:38:51.599950 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl pgmap/latest = 239840 bytes
> --- begin dump of recent events ---2012-11-19 20:38:51.600509 7fc13fdc6780 -1
> *** Caught signal (Aborted) **
>  in thread 7fc13fdc6780
>
>  ceph version 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150)
>  1: ceph-mon() [0x53adf8]
>  2: (()+0xfe90) [0x7fc141830e90]
>  3: (gsignal()+0x3e) [0x7fc140016dae]
>  4: (abort()+0x17b) [0x7fc14001825b]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fc141af300d]
>  6: (()+0xb31b6) [0x7fc141af11b6]
>  7: (()+0xb31e3) [0x7fc141af11e3]
>  8: (()+0xb32de) [0x7fc141af12de]
>  9: ceph-mon() [0x5ecb9f]
>  10: (Paxos::get_stashed(ceph::buffer::list&)+0x1ed) [0x49e28d]
>  11: (Paxos::init()+0x109) [0x49e609]
>  12: (Monitor::init()+0x36a) [0x485a4a]
>  13: (main()+0x1289) [0x46d909]
>  14: (__libc_start_main()+0xed) [0x7fc14000364d]
>  15: ceph-mon() [0x46fa09]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
>    -55> 2012-11-19 20:38:51.596694 7fc13fdc6780  5 asok(0x213d000) register_command perfcounters_dump hook 0x2131050
>    -55> 2012-11-19 20:38:51.596720 7fc13fdc6780  5 asok(0x213d000) register_command 1 hook 0x2131050
>    -54> 2012-11-19 20:38:51.596725 7fc13fdc6780  5 asok(0x213d000) register_command perf dump hook 0x2131050
>    -53> 2012-11-19 20:38:51.596735 7fc13fdc6780  5 asok(0x213d000) register_command perfcounters_schema hook 0x2131050
>    -52> 2012-11-19 20:38:51.596740 7fc13fdc6780  5 asok(0x213d000) register_command 2 hook 0x2131050
>    -51> 2012-11-19 20:38:51.596745 7fc13fdc6780  5 asok(0x213d000) register_command perf schema hook 0x2131050
>    -50> 2012-11-19 20:38:51.596752 7fc13fdc6780  5 asok(0x213d000) register_command config show hook 0x2131050
>    -49> 2012-11-19 20:38:51.596756 7fc13fdc6780  5 asok(0x213d000) register_command config set hook 0x2131050
>    -48> 2012-11-19 20:38:51.596761 7fc13fdc6780  5 asok(0x213d000) register_command log flush hook 0x2131050
>    -47> 2012-11-19 20:38:51.596765 7fc13fdc6780  5 asok(0x213d000) register_command log dump hook 0x2131050
>    -46> 2012-11-19 20:38:51.596770 7fc13fdc6780  5 asok(0x213d000) register_command log reopen hook 0x2131050
>    -45> 2012-11-19 20:38:51.598468 7fc13fdc6780  0 ceph version 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150), process ceph-mon, pid 21012
>    -44> 2012-11-19 20:38:51.598482 7fc13fdc6780  1 store(/ceph/mon.vault01) mount
>    -43> 2012-11-19 20:38:51.598527 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 21
>    -42> 2012-11-19 20:38:51.598542 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl magic = 21 bytes
>    -41> 2012-11-19 20:38:51.598562 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 75
>    -40> 2012-11-19 20:38:51.598567 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl feature_set = 75 bytes
>    -39> 2012-11-19 20:38:51.598582 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 205
>    -38> 2012-11-19 20:38:51.598586 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl monmap/latest = 205 bytes
>    -37> 2012-11-19 20:38:51.598809 7fc13fdc6780  1 -- 10.0.1.1:6789/0 learned my addr 10.0.1.1:6789/0
>    -36> 2012-11-19 20:38:51.598818 7fc13fdc6780  1 accepter.accepter.bind my_inst.addr is 10.0.1.1:6789/0 need_addr=0
>    -35> 2012-11-19 20:38:51.599219 7fc13fdc6780  1 finished global_init_daemonize
>    -34> 2012-11-19 20:38:51.599350 7fc13fdc6780  5 asok(0x213d000) init /var/run/ceph/ceph-mon.vault01.asok
>    -33> 2012-11-19 20:38:51.599371 7fc13fdc6780  5 asok(0x213d000) bind_and_listen /var/run/ceph/ceph-mon.vault01.asok
>    -32> 2012-11-19 20:38:51.599438 7fc13fdc6780  5 asok(0x213d000) register_command 0 hook 0x2130030
>    -31> 2012-11-19 20:38:51.599444 7fc13fdc6780  5 asok(0x213d000) register_command version hook 0x2130030
>    -30> 2012-11-19 20:38:51.599451 7fc13fdc6780  5 asok(0x213d000) register_command git_version hook 0x2130030
>    -29> 2012-11-19 20:38:51.599456 7fc13fdc6780  5 asok(0x213d000) register_command help hook 0x2131040
>    -28> 2012-11-19 20:38:51.599472 7fc13edc2700  5 asok(0x213d000) entry start
>    -27> 2012-11-19 20:38:51.599498 7fc13fdc6780  1 -- 10.0.1.1:6789/0 messenger.start
>    -26> 2012-11-19 20:38:51.599508 7fc13fdc6780  1 accepter.accepter.start
>    -25> 2012-11-19 20:38:51.599610 7fc13fdc6780  1 mon.vault01@-1(probing) e1 init fsid 4d7d8d20-338c-4bdc-9918-9bcf04f9a832
>    -24> 2012-11-19 20:38:51.599674 7fc13cdbe700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213c6c0 sd=14 :6789 pgs=0 cs=0 l=0).accept sd=14
>    -23> 2012-11-19 20:38:51.599678 7fc141eff700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213c240 sd=9 :6789 pgs=0 cs=0 l=0).accept sd=9
>    -22> 2012-11-19 20:38:51.599718 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 37
>    -21> 2012-11-19 20:38:51.599723 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl cluster_uuid = 37 bytes
>    -20> 2012-11-19 20:38:51.599718 7fc13ccbd700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213c480 sd=19 :6789 pgs=0 cs=0 l=0).accept sd=19
>    -19> 2012-11-19 20:38:51.599729 7fc13fdc6780 10 mon.vault01@-1(probing) e1 check_fsid cluster_uuid contains '4d7d8d20-338c-4bdc-9918-9bcf04f9a832'
>    -18> 2012-11-19 20:38:51.599739 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 75
>    -17> 2012-11-19 20:38:51.599745 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl feature_set = 75 bytes
>    -16> 2012-11-19 20:38:51.599751 7fc13fdc6780 10 mon.vault01@-1(probing) e1 features compat={},rocompat={},incompat={1=initial feature set (~v.18)}
>    -15> 2012-11-19 20:38:51.599759 7fc13fdc6780 15 store(/ceph/mon.vault01) exists_bl joined
>    -14> 2012-11-19 20:38:51.599769 7fc13fdc6780 10 mon.vault01@-1(probing) e1 has_ever_joined = 1
>    -13> 2012-11-19 20:38:51.599794 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int pgmap/last_committed = 133333
>    -12> 2012-11-19 20:38:51.599795 7fc13cdbe700  5 throttle(mon_daemon_bytes 0x7fffae473da0) get 56 (0 -> 56)
>    -11> 2012-11-19 20:38:51.599801 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int pgmap/first_committed = 132833
>    -10> 2012-11-19 20:38:51.599805 7fc13cdbe700  5 throttle(msgr_dispatch_throttler-mon 0x2153488) get 56 (0 -> 56)
>     -9> 2012-11-19 20:38:51.599810 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 239840
>     -8> 2012-11-19 20:38:51.599904 7fc141eff700  5 throttle(mon_daemon_bytes 0x7fffae473da0) get 56 (56 -> 112)
>     -7> 2012-11-19 20:38:51.599910 7fc141eff700  5 throttle(msgr_dispatch_throttler-mon 0x2153488) get 56 (56 -> 112)
>     -6> 2012-11-19 20:38:51.599916 7fc13ccbd700  5 throttle(mon_daemon_bytes 0x7fffae473da0) get 62 (112 -> 174)
>     -5> 2012-11-19 20:38:51.599929 7fc13ccbd700  5 throttle(msgr_dispatch_throttler-mon 0x2153488) get 62 (112 -> 174)
>     -4> 2012-11-19 20:38:51.599928 7fc13cbbc700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213cd80 sd=20 :6789 pgs=0 cs=0 l=0).accept sd=20
>     -3> 2012-11-19 20:38:51.599950 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl pgmap/latest = 239840 bytes
>     -2> 2012-11-19 20:38:51.600027 7fc13cbbc700  5 throttle(mon_daemon_bytes 0x7fffae473da0) get 56 (174 -> 230)
>     -1> 2012-11-19 20:38:51.600033 7fc13cbbc700  5 throttle(msgr_dispatch_throttler-mon 0x2153488) get 56 (174 -> 230)
>      0> 2012-11-19 20:38:51.600509 7fc13fdc6780 -1 *** Caught signal (Aborted) **
>  in thread 7fc13fdc6780
>
>  ceph version 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150)
>  1: ceph-mon() [0x53adf8]
>  2: (()+0xfe90) [0x7fc141830e90]
>  3: (gsignal()+0x3e) [0x7fc140016dae]
>  4: (abort()+0x17b) [0x7fc14001825b]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fc141af300d]
>  6: (()+0xb31b6) [0x7fc141af11b6]
>  7: (()+0xb31e3) [0x7fc141af11e3]
>  8: (()+0xb32de) [0x7fc141af12de]
>  9: ceph-mon() [0x5ecb9f]
>  10: (Paxos::get_stashed(ceph::buffer::list&)+0x1ed) [0x49e28d]
>  11: (Paxos::init()+0x109) [0x49e609]
>  12: (Monitor::init()+0x36a) [0x485a4a]
>  13: (main()+0x1289) [0x46d909]
>  14: (__libc_start_main()+0xed) [0x7fc14000364d]
>  15: ceph-mon() [0x46fa09]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>
> --- logging levels ---
>    0/ 5 none
>    0/ 5 lockdep
>    0/ 5 context
>    1/ 5 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 0 buffer
>    0/ 5 timer
>    0/ 5 filer
>    0/ 0 objecter
>    0/ 5 rados
>    0/ 5 rbd
>    0/ 5 journaler
>    0/ 5 objectcacher
>    0/ 5 client
>    0/ 5 osd
>    0/ 5 optracker
>    0/ 5 objclass
>    1/ 5 filestore
>    1/ 5 journal
>    1/ 1 ms
>   20/20 mon
>    0/ 5 monc
>   20/20 paxos
>    0/ 5 tp
>   20/20 auth
>    1/ 5 finisher
>    1/ 5 heartbeatmap
>    1/ 5 perfcounter
>    1/ 5 rgw
>    1/ 5 hadoop
>    1/ 5 asok
>    1/ 5 throttle
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new      1000000
>   log_file /var/log/ceph/mon.vault01.log
> --- end dump of recent events ---
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux