Re: Ceph Mon Crashing after creating Cephfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi John

Thanks for that, life saver! Running on Debian Jessie and I replaced the mail ceph repo in source.d to:

deb http://gitbuilder.ceph.com/ceph-deb-jessie-x86_64-basic/ref/wip-17466-jewel/ jessie main

Updated and Upgraded Ceph, tried to manually run my mon which failed as it had already been started during the upgrade!

Just to ask about the gitbuilder repo's, is there a way I can track if this patch gets pushed into the mainline (10.2.4 or something)? Are there any gotchas to consider with using them?

Thanks again, My Domain Controller thanks you, my mailserver thanks you and my webserver thanks you!!!


James

On 7 October 2016 at 11:37, John Spray <jspray@xxxxxxxxxx> wrote:
On Fri, Oct 7, 2016 at 8:04 AM, James Horner <humankind135@xxxxxxxxx> wrote:
> Hi All
>
> Just wondering if anyone can help me out here. Small home cluster with 1
> mon, the next phase of the plan called for more but I hadn't got there yet.
>
> I was trying to setup Cephfs and I ran "ceph fs new" without having an MDS
> as I was having issues with rank 0 immediately being degraded. My thinking
> was that I would bring up an MDS and it would be assigned to rank 0. Anyhoo
> after I did that my mon crashed and I havn't been able to restart it since,
> its output is:
>
> root@bertie ~ $ /usr/bin/ceph-mon -f --cluster ceph --id bertie --setuser
> ceph --setgroup ceph 2>&1 | tee /var/log/ceph/mon-temp
> starting mon.bertie rank 0 at 192.168.2.3:6789/0 mon_data
> /var/lib/ceph/mon/ceph-bertie fsid 06e2f4e0-35e1-4f8c-b2a0-bc72c4cd3199
> terminate called after throwing an instance of 'std::out_of_range'
>   what():  map::at
> *** Caught signal (Aborted) **
>  in thread 7fad7f86c480 thread_name:ceph-mon
>  ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>  1: (()+0x525737) [0x56219142b737]
>  2: (()+0xf8d0) [0x7fad7eb3c8d0]
>  3: (gsignal()+0x37) [0x7fad7cdc6067]
>  4: (abort()+0x148) [0x7fad7cdc7448]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7fad7d6b3b3d]
>  6: (()+0x5ebb6) [0x7fad7d6b1bb6]
>  7: (()+0x5ec01) [0x7fad7d6b1c01]
>  8: (()+0x5ee19) [0x7fad7d6b1e19]
>  9: (std::__throw_out_of_range(char const*)+0x66) [0x7fad7d707b76]
>  10: (FSMap::get_filesystem(int) const+0x7c) [0x56219126ed6c]
>  11: (MDSMonitor::maybe_promote_standby(std::shared_ptr<Filesystem>)+0x48a)
> [0x56219125b13a]
>  12: (MDSMonitor::tick()+0x4bb) [0x56219126084b]
>  13: (MDSMonitor::on_active()+0x28) [0x562191255da8]
>  14: (PaxosService::_active()+0x60a) [0x5621911d896a]
>  15: (PaxosService::election_finished()+0x7a) [0x5621911d8d7a]
>  16: (Monitor::win_election(unsigned int, std::set<int, std::less<int>,
> std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int,
> std::less<int>, std::allocator<int> > const*)+0x24e) [0x5621911958ce]
>  17: (Monitor::win_standalone_election()+0x20f) [0x562191195d9f]
>  18: (Monitor::bootstrap()+0x91b) [0x56219119676b]
>  19: (Monitor::init()+0x17d) [0x562191196a5d]
>  20: (main()+0x2694) [0x562191106f44]
>  21: (__libc_start_main()+0xf5) [0x7fad7cdb2b45]
>  22: (()+0x257edf) [0x56219115dedf]
> 2016-10-07 06:50:39.049061 7fad7f86c480 -1 *** Caught signal (Aborted) **
>  in thread 7fad7f86c480 thread_name:ceph-mon
>
>  ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>  1: (()+0x525737) [0x56219142b737]
>  2: (()+0xf8d0) [0x7fad7eb3c8d0]
>  3: (gsignal()+0x37) [0x7fad7cdc6067]
>  4: (abort()+0x148) [0x7fad7cdc7448]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7fad7d6b3b3d]
>  6: (()+0x5ebb6) [0x7fad7d6b1bb6]
>  7: (()+0x5ec01) [0x7fad7d6b1c01]
>  8: (()+0x5ee19) [0x7fad7d6b1e19]
>  9: (std::__throw_out_of_range(char const*)+0x66) [0x7fad7d707b76]
>  10: (FSMap::get_filesystem(int) const+0x7c) [0x56219126ed6c]
>  11: (MDSMonitor::maybe_promote_standby(std::shared_ptr<Filesystem>)+0x48a)
> [0x56219125b13a]
>  12: (MDSMonitor::tick()+0x4bb) [0x56219126084b]
>  13: (MDSMonitor::on_active()+0x28) [0x562191255da8]
>  14: (PaxosService::_active()+0x60a) [0x5621911d896a]
>  15: (PaxosService::election_finished()+0x7a) [0x5621911d8d7a]
>  16: (Monitor::win_election(unsigned int, std::set<int, std::less<int>,
> std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int,
> std::less<int>, std::allocator<int> > const*)+0x24e) [0x5621911958ce]
>  17: (Monitor::win_standalone_election()+0x20f) [0x562191195d9f]
>  18: (Monitor::bootstrap()+0x91b) [0x56219119676b]
>  19: (Monitor::init()+0x17d) [0x562191196a5d]
>  20: (main()+0x2694) [0x562191106f44]
>  21: (__libc_start_main()+0xf5) [0x7fad7cdb2b45]
>  22: (()+0x257edf) [0x56219115dedf]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
>      0> 2016-10-07 06:50:39.049061 7fad7f86c480 -1 *** Caught signal
> (Aborted) **
>  in thread 7fad7f86c480 thread_name:ceph-mon
>
>  ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>  1: (()+0x525737) [0x56219142b737]
>  2: (()+0xf8d0) [0x7fad7eb3c8d0]
>  3: (gsignal()+0x37) [0x7fad7cdc6067]
>  4: (abort()+0x148) [0x7fad7cdc7448]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7fad7d6b3b3d]
>  6: (()+0x5ebb6) [0x7fad7d6b1bb6]
>  7: (()+0x5ec01) [0x7fad7d6b1c01]
>  8: (()+0x5ee19) [0x7fad7d6b1e19]
>  9: (std::__throw_out_of_range(char const*)+0x66) [0x7fad7d707b76]
>  10: (FSMap::get_filesystem(int) const+0x7c) [0x56219126ed6c]
>  11: (MDSMonitor::maybe_promote_standby(std::shared_ptr<Filesystem>)+0x48a)
> [0x56219125b13a]
>  12: (MDSMonitor::tick()+0x4bb) [0x56219126084b]
>  13: (MDSMonitor::on_active()+0x28) [0x562191255da8]
>  14: (PaxosService::_active()+0x60a) [0x5621911d896a]
>  15: (PaxosService::election_finished()+0x7a) [0x5621911d8d7a]
>  16: (Monitor::win_election(unsigned int, std::set<int, std::less<int>,
> std::allocator<int> >&, unsigned long, MonCommand const*, int, std::set<int,
> std::less<int>, std::allocator<int> > const*)+0x24e) [0x5621911958ce]
>  17: (Monitor::win_standalone_election()+0x20f) [0x562191195d9f]
>  18: (Monitor::bootstrap()+0x91b) [0x56219119676b]
>  19: (Monitor::init()+0x17d) [0x562191196a5d]
>  20: (main()+0x2694) [0x562191106f44]
>  21: (__libc_start_main()+0xf5) [0x7fad7cdb2b45]
>  22: (()+0x257edf) [0x56219115dedf]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
> Fairly sure its a Cephfs Error due to :
>  9: (std::__throw_out_of_range(char const*)+0x66) [0x7fad7d707b76]
>  10: (FSMap::get_filesystem(int) const+0x7c) [0x56219126ed6c]

It looks like you're hitting this:
http://tracker.ceph.com/issues/17466

There is a branch called wip-17466-jewel that has a fix cherry picked
onto 10.2.3 -- hopefully if you install the mon from that branch then
your mons will be happy again.

Packages:
http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic/ref/wip-17466-jewel/
http://gitbuilder.ceph.com/ceph-rpm-centos7-x86_64-basic/ref/wip-17466-jewel/

Or of course you can build your own if you're on a platform that isn't
on gitbuilder.ceph.com

John

> I have nothing in the CephFS but I had just finished moving all my VMs into
> rados. I don't care if CephFS gets wiped but I really need the vm images.
>
> If the mon is borked permanently then is there a way I can recover the
> images manually?
>
> Thanks in advance for any help
>
> James
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux