Re: Can't start ceph mon

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Also, if you still have it, could you zip up your monitor data
directory and put it somewhere accessible to us? (I can provide you a
drop point if necessary.) We'd like to look at the file layouts a bit
since we thought we were properly handling ENOSPC-style issues.
-Greg

On Mon, Nov 19, 2012 at 1:45 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> On Mon, Nov 19, 2012 at 1:08 PM, Dave Humphreys (Datatone)
> <dave@xxxxxxxxxxxxxx> wrote:
>>
>> I have a problem in which I can't start my ceph monitor. The log is shown below.
>>
>> The log shows version 0.54. I was running 0.52 when the problem arose, and I moved to the latest in case the newer version fixed the problem.
>>
>> The original failure happened a week or so ago, and could have been as a result of running out of disk space when the ceph monitor log became huge.
>
> That is almost certainly the case, although I thought we were handling
> this issue better now.
>
>> What should I do to recover the situation?
>
> Do you have other monitors in working order? The easiest way to handle
> it if that's the case is just to remove this monitor from the cluster
> and add it back in as a new monitor with a fresh store. If not we can
> look into reconstructing it.
> -Greg
>
>>
>>
>> David
>>
>>
>>
>>
>>
>> 2012-11-19 20:38:51.598468 7fc13fdc6780  0 ceph version 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150), process ceph-mon, pid 21012
>> 2012-11-19 20:38:51.598482 7fc13fdc6780  1 store(/ceph/mon.vault01) mount
>> 2012-11-19 20:38:51.598527 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 21
>> 2012-11-19 20:38:51.598542 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl magic = 21 bytes
>> 2012-11-19 20:38:51.598562 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 75
>> 2012-11-19 20:38:51.598567 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl feature_set = 75 bytes
>> 2012-11-19 20:38:51.598582 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 205
>> 2012-11-19 20:38:51.598586 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl monmap/latest = 205 bytes
>> 2012-11-19 20:38:51.598809 7fc13fdc6780  1 -- 10.0.1.1:6789/0 learned my addr 10.0.1.1:6789/0
>> 2012-11-19 20:38:51.598818 7fc13fdc6780  1 accepter.accepter.bind my_inst.addr is 10.0.1.1:6789/0 need_addr=0
>> 2012-11-19 20:38:51.599498 7fc13fdc6780  1 -- 10.0.1.1:6789/0 messenger.start
>> 2012-11-19 20:38:51.599508 7fc13fdc6780  1 accepter.accepter.start
>> 2012-11-19 20:38:51.599610 7fc13fdc6780  1 mon.vault01@-1(probing) e1 init fsid 4d7d8d20-338c-4bdc-9918-9bcf04f9a832
>> 2012-11-19 20:38:51.599674 7fc13cdbe700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213c6c0 sd=14 :6789 pgs=0 cs=0 l=0).accept sd=14
>> 2012-11-19 20:38:51.599678 7fc141eff700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213c240 sd=9 :6789 pgs=0 cs=0 l=0).accept sd=9
>> 2012-11-19 20:38:51.599718 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 37
>> 2012-11-19 20:38:51.599723 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl cluster_uuid = 37 bytes
>> 2012-11-19 20:38:51.599718 7fc13ccbd700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213c480 sd=19 :6789 pgs=0 cs=0 l=0).accept sd=19
>> 2012-11-19 20:38:51.599729 7fc13fdc6780 10 mon.vault01@-1(probing) e1 check_fsid cluster_uuid contains '4d7d8d20-338c-4bdc-9918-9bcf04f9a832'
>> 2012-11-19 20:38:51.599739 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 75
>> 2012-11-19 20:38:51.599745 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl feature_set = 75 bytes
>> 2012-11-19 20:38:51.599751 7fc13fdc6780 10 mon.vault01@-1(probing) e1 features compat={},rocompat={},incompat={1=initial feature set (~v.18)}
>> 2012-11-19 20:38:51.599759 7fc13fdc6780 15 store(/ceph/mon.vault01) exists_bl joined
>> 2012-11-19 20:38:51.599769 7fc13fdc6780 10 mon.vault01@-1(probing) e1 has_ever_joined = 1
>> 2012-11-19 20:38:51.599794 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int pgmap/last_committed = 133333
>> 2012-11-19 20:38:51.599801 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int pgmap/first_committed = 132833
>> 2012-11-19 20:38:51.599810 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 239840
>> 2012-11-19 20:38:51.599928 7fc13cbbc700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213cd80 sd=20 :6789 pgs=0 cs=0 l=0).accept sd=20
>> 2012-11-19 20:38:51.599950 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl pgmap/latest = 239840 bytes
>> --- begin dump of recent events ---2012-11-19 20:38:51.600509 7fc13fdc6780 -1
>> *** Caught signal (Aborted) **
>>  in thread 7fc13fdc6780
>>
>>  ceph version 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150)
>>  1: ceph-mon() [0x53adf8]
>>  2: (()+0xfe90) [0x7fc141830e90]
>>  3: (gsignal()+0x3e) [0x7fc140016dae]
>>  4: (abort()+0x17b) [0x7fc14001825b]
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fc141af300d]
>>  6: (()+0xb31b6) [0x7fc141af11b6]
>>  7: (()+0xb31e3) [0x7fc141af11e3]
>>  8: (()+0xb32de) [0x7fc141af12de]
>>  9: ceph-mon() [0x5ecb9f]
>>  10: (Paxos::get_stashed(ceph::buffer::list&)+0x1ed) [0x49e28d]
>>  11: (Paxos::init()+0x109) [0x49e609]
>>  12: (Monitor::init()+0x36a) [0x485a4a]
>>  13: (main()+0x1289) [0x46d909]
>>  14: (__libc_start_main()+0xed) [0x7fc14000364d]
>>  15: ceph-mon() [0x46fa09]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>>
>>    -55> 2012-11-19 20:38:51.596694 7fc13fdc6780  5 asok(0x213d000) register_command perfcounters_dump hook 0x2131050
>>    -55> 2012-11-19 20:38:51.596720 7fc13fdc6780  5 asok(0x213d000) register_command 1 hook 0x2131050
>>    -54> 2012-11-19 20:38:51.596725 7fc13fdc6780  5 asok(0x213d000) register_command perf dump hook 0x2131050
>>    -53> 2012-11-19 20:38:51.596735 7fc13fdc6780  5 asok(0x213d000) register_command perfcounters_schema hook 0x2131050
>>    -52> 2012-11-19 20:38:51.596740 7fc13fdc6780  5 asok(0x213d000) register_command 2 hook 0x2131050
>>    -51> 2012-11-19 20:38:51.596745 7fc13fdc6780  5 asok(0x213d000) register_command perf schema hook 0x2131050
>>    -50> 2012-11-19 20:38:51.596752 7fc13fdc6780  5 asok(0x213d000) register_command config show hook 0x2131050
>>    -49> 2012-11-19 20:38:51.596756 7fc13fdc6780  5 asok(0x213d000) register_command config set hook 0x2131050
>>    -48> 2012-11-19 20:38:51.596761 7fc13fdc6780  5 asok(0x213d000) register_command log flush hook 0x2131050
>>    -47> 2012-11-19 20:38:51.596765 7fc13fdc6780  5 asok(0x213d000) register_command log dump hook 0x2131050
>>    -46> 2012-11-19 20:38:51.596770 7fc13fdc6780  5 asok(0x213d000) register_command log reopen hook 0x2131050
>>    -45> 2012-11-19 20:38:51.598468 7fc13fdc6780  0 ceph version 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150), process ceph-mon, pid 21012
>>    -44> 2012-11-19 20:38:51.598482 7fc13fdc6780  1 store(/ceph/mon.vault01) mount
>>    -43> 2012-11-19 20:38:51.598527 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 21
>>    -42> 2012-11-19 20:38:51.598542 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl magic = 21 bytes
>>    -41> 2012-11-19 20:38:51.598562 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 75
>>    -40> 2012-11-19 20:38:51.598567 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl feature_set = 75 bytes
>>    -39> 2012-11-19 20:38:51.598582 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 205
>>    -38> 2012-11-19 20:38:51.598586 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl monmap/latest = 205 bytes
>>    -37> 2012-11-19 20:38:51.598809 7fc13fdc6780  1 -- 10.0.1.1:6789/0 learned my addr 10.0.1.1:6789/0
>>    -36> 2012-11-19 20:38:51.598818 7fc13fdc6780  1 accepter.accepter.bind my_inst.addr is 10.0.1.1:6789/0 need_addr=0
>>    -35> 2012-11-19 20:38:51.599219 7fc13fdc6780  1 finished global_init_daemonize
>>    -34> 2012-11-19 20:38:51.599350 7fc13fdc6780  5 asok(0x213d000) init /var/run/ceph/ceph-mon.vault01.asok
>>    -33> 2012-11-19 20:38:51.599371 7fc13fdc6780  5 asok(0x213d000) bind_and_listen /var/run/ceph/ceph-mon.vault01.asok
>>    -32> 2012-11-19 20:38:51.599438 7fc13fdc6780  5 asok(0x213d000) register_command 0 hook 0x2130030
>>    -31> 2012-11-19 20:38:51.599444 7fc13fdc6780  5 asok(0x213d000) register_command version hook 0x2130030
>>    -30> 2012-11-19 20:38:51.599451 7fc13fdc6780  5 asok(0x213d000) register_command git_version hook 0x2130030
>>    -29> 2012-11-19 20:38:51.599456 7fc13fdc6780  5 asok(0x213d000) register_command help hook 0x2131040
>>    -28> 2012-11-19 20:38:51.599472 7fc13edc2700  5 asok(0x213d000) entry start
>>    -27> 2012-11-19 20:38:51.599498 7fc13fdc6780  1 -- 10.0.1.1:6789/0 messenger.start
>>    -26> 2012-11-19 20:38:51.599508 7fc13fdc6780  1 accepter.accepter.start
>>    -25> 2012-11-19 20:38:51.599610 7fc13fdc6780  1 mon.vault01@-1(probing) e1 init fsid 4d7d8d20-338c-4bdc-9918-9bcf04f9a832
>>    -24> 2012-11-19 20:38:51.599674 7fc13cdbe700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213c6c0 sd=14 :6789 pgs=0 cs=0 l=0).accept sd=14
>>    -23> 2012-11-19 20:38:51.599678 7fc141eff700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213c240 sd=9 :6789 pgs=0 cs=0 l=0).accept sd=9
>>    -22> 2012-11-19 20:38:51.599718 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 37
>>    -21> 2012-11-19 20:38:51.599723 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl cluster_uuid = 37 bytes
>>    -20> 2012-11-19 20:38:51.599718 7fc13ccbd700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213c480 sd=19 :6789 pgs=0 cs=0 l=0).accept sd=19
>>    -19> 2012-11-19 20:38:51.599729 7fc13fdc6780 10 mon.vault01@-1(probing) e1 check_fsid cluster_uuid contains '4d7d8d20-338c-4bdc-9918-9bcf04f9a832'
>>    -18> 2012-11-19 20:38:51.599739 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 75
>>    -17> 2012-11-19 20:38:51.599745 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl feature_set = 75 bytes
>>    -16> 2012-11-19 20:38:51.599751 7fc13fdc6780 10 mon.vault01@-1(probing) e1 features compat={},rocompat={},incompat={1=initial feature set (~v.18)}
>>    -15> 2012-11-19 20:38:51.599759 7fc13fdc6780 15 store(/ceph/mon.vault01) exists_bl joined
>>    -14> 2012-11-19 20:38:51.599769 7fc13fdc6780 10 mon.vault01@-1(probing) e1 has_ever_joined = 1
>>    -13> 2012-11-19 20:38:51.599794 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int pgmap/last_committed = 133333
>>    -12> 2012-11-19 20:38:51.599795 7fc13cdbe700  5 throttle(mon_daemon_bytes 0x7fffae473da0) get 56 (0 -> 56)
>>    -11> 2012-11-19 20:38:51.599801 7fc13fdc6780 15 store(/ceph/mon.vault01) get_int pgmap/first_committed = 132833
>>    -10> 2012-11-19 20:38:51.599805 7fc13cdbe700  5 throttle(msgr_dispatch_throttler-mon 0x2153488) get 56 (0 -> 56)
>>     -9> 2012-11-19 20:38:51.599810 7fc13fdc6780 20 store(/ceph/mon.vault01) reading at off 0 of 239840
>>     -8> 2012-11-19 20:38:51.599904 7fc141eff700  5 throttle(mon_daemon_bytes 0x7fffae473da0) get 56 (56 -> 112)
>>     -7> 2012-11-19 20:38:51.599910 7fc141eff700  5 throttle(msgr_dispatch_throttler-mon 0x2153488) get 56 (56 -> 112)
>>     -6> 2012-11-19 20:38:51.599916 7fc13ccbd700  5 throttle(mon_daemon_bytes 0x7fffae473da0) get 62 (112 -> 174)
>>     -5> 2012-11-19 20:38:51.599929 7fc13ccbd700  5 throttle(msgr_dispatch_throttler-mon 0x2153488) get 62 (112 -> 174)
>>     -4> 2012-11-19 20:38:51.599928 7fc13cbbc700  1 -- 10.0.1.1:6789/0 >> :/0 pipe(0x213cd80 sd=20 :6789 pgs=0 cs=0 l=0).accept sd=20
>>     -3> 2012-11-19 20:38:51.599950 7fc13fdc6780 15 store(/ceph/mon.vault01) get_bl pgmap/latest = 239840 bytes
>>     -2> 2012-11-19 20:38:51.600027 7fc13cbbc700  5 throttle(mon_daemon_bytes 0x7fffae473da0) get 56 (174 -> 230)
>>     -1> 2012-11-19 20:38:51.600033 7fc13cbbc700  5 throttle(msgr_dispatch_throttler-mon 0x2153488) get 56 (174 -> 230)
>>      0> 2012-11-19 20:38:51.600509 7fc13fdc6780 -1 *** Caught signal (Aborted) **
>>  in thread 7fc13fdc6780
>>
>>  ceph version 0.54 (commit:60b84b095b1009a305d4d6a5b16f88571cbd3150)
>>  1: ceph-mon() [0x53adf8]
>>  2: (()+0xfe90) [0x7fc141830e90]
>>  3: (gsignal()+0x3e) [0x7fc140016dae]
>>  4: (abort()+0x17b) [0x7fc14001825b]
>>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fc141af300d]
>>  6: (()+0xb31b6) [0x7fc141af11b6]
>>  7: (()+0xb31e3) [0x7fc141af11e3]
>>  8: (()+0xb32de) [0x7fc141af12de]
>>  9: ceph-mon() [0x5ecb9f]
>>  10: (Paxos::get_stashed(ceph::buffer::list&)+0x1ed) [0x49e28d]
>>  11: (Paxos::init()+0x109) [0x49e609]
>>  12: (Monitor::init()+0x36a) [0x485a4a]
>>  13: (main()+0x1289) [0x46d909]
>>  14: (__libc_start_main()+0xed) [0x7fc14000364d]
>>  15: ceph-mon() [0x46fa09]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>>
>> --- logging levels ---
>>    0/ 5 none
>>    0/ 5 lockdep
>>    0/ 5 context
>>    1/ 5 crush
>>    1/ 5 mds
>>    1/ 5 mds_balancer
>>    1/ 5 mds_locker
>>    1/ 5 mds_log
>>    1/ 5 mds_log_expire
>>    1/ 5 mds_migrator
>>    0/ 0 buffer
>>    0/ 5 timer
>>    0/ 5 filer
>>    0/ 0 objecter
>>    0/ 5 rados
>>    0/ 5 rbd
>>    0/ 5 journaler
>>    0/ 5 objectcacher
>>    0/ 5 client
>>    0/ 5 osd
>>    0/ 5 optracker
>>    0/ 5 objclass
>>    1/ 5 filestore
>>    1/ 5 journal
>>    1/ 1 ms
>>   20/20 mon
>>    0/ 5 monc
>>   20/20 paxos
>>    0/ 5 tp
>>   20/20 auth
>>    1/ 5 finisher
>>    1/ 5 heartbeatmap
>>    1/ 5 perfcounter
>>    1/ 5 rgw
>>    1/ 5 hadoop
>>    1/ 5 asok
>>    1/ 5 throttle
>>   -2/-2 (syslog threshold)
>>   -1/-1 (stderr threshold)
>>   max_recent     10000
>>   max_new      1000000
>>   log_file /var/log/ceph/mon.vault01.log
>> --- end dump of recent events ---
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux