In my case the version will be identical. But I might have to do this node by node approach if I can't stabilize the more general shutdown/bring-up approach. There are 192 OSD in my cluster, so it will take a while to go node by node unfortunately. -Chris > On Mar 1, 2017, at 2:50 AM, Steffen Weißgerber <WeissgerberS@xxxxxxx> wrote: > > Hello, > > some time ago I upgraded our 6 node cluster (0.94.9) running on Ubuntu from Trusty > to Xenial. > > The problem here was that with the os update also ceph is upgraded what we did not want > in the same step because then we had to upgrade all nodes at the same time. > > Therefore we did it node by node first freeing the osd's on the node with setting the weight to 0. > > After os update, configuring the right ceph version for our setup and testing the reboot so that > all components start up correctly we set the osd weights to the normal value so that the > cluster was rebalancing. > > With this procedure the cluster was always up. > > Regards > > Steffen > > >>>> "Heller, Chris" <cheller@xxxxxxxxxx> schrieb am Montag, 27. Februar 2017 um > 18:01: >> I am attempting an operating system upgrade of a live Ceph cluster. Before I >> go an screw up my production system, I have been testing on a smaller >> installation, and I keep running into issues when bringing the Ceph FS >> metadata server online. >> >> My approach here has been to store all Ceph critical files on non-root >> partitions, so the OS install can safely proceed without overwriting any of >> the Ceph configuration or data. >> >> Here is how I proceed: >> >> First I bring down the Ceph FS via `ceph mds cluster_down`. >> Second, to prevent OSDs from trying to repair data, I run `ceph osd set >> noout` >> Finally I stop the ceph processes in the following order: ceph-mds, ceph-mon, >> ceph-osd >> >> Note my cluster has 1 mds and 1 mon, and 7 osd. >> >> I then install the new OS and then bring the cluster back up by walking the >> steps in reverse: >> >> First I start the ceph processes in the following order: ceph-osd, ceph-mon, >> ceph-mds >> Second I restore OSD functionality with `ceph osd unset noout` >> Finally I bring up the Ceph FS via `ceph mds cluster_up` >> >> Everything works smoothly except the Ceph FS bring up. The MDS starts in the >> active:replay state and eventually crashes with the following backtrace: >> >> starting mds.cuba at :/0 >> 2017-02-27 16:56:08.233680 7f31daa3b7c0 -1 mds.-1.0 log_to_monitors >> {default=true} >> 2017-02-27 16:56:08.537714 7f31d30df700 -1 mds.0.sessionmap _load_finish got >> (2) No such file or directory >> mds/SessionMap.cc <http://sessionmap.cc/>: In function 'void >> SessionMap::_load_finish(int, ceph::bufferlist&)' thread 7f31d30df700 time >> 2017-02-27 16:56:08.537739 >> mds/SessionMap.cc <http://sessionmap.cc/>: 98: FAILED assert(0 == "failed to >> load sessionmap") >> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x8b) [0x98bb4b] >> 2: (SessionMap::_load_finish(int, ceph::buffer::list&)+0x2b4) [0x7df2a4] >> 3: (MDSIOContextBase::complete(int)+0x95) [0x7e34b5] >> 4: (Finisher::finisher_thread_entry()+0x190) [0x8bd6d0] >> 5: (()+0x8192) [0x7f31d9c8f192] >> 6: (clone()+0x6d) [0x7f31d919c51d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to >> interpret this. >> 2017-02-27 16:56:08.538493 7f31d30df700 -1 mds/SessionMap.cc >> <http://sessionmap.cc/>: In function 'void SessionMap::_load_finish(int, >> ceph::bufferlist&)' thread 7f31d30df700 time 2017-02-27 16:56:08.537739 >> mds/SessionMap.cc <http://sessionmap.cc/>: 98: FAILED assert(0 == "failed to >> load sessionmap") >> >> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x8b) [0x98bb4b] >> 2: (SessionMap::_load_finish(int, ceph::buffer::list&)+0x2b4) [0x7df2a4] >> 3: (MDSIOContextBase::complete(int)+0x95) [0x7e34b5] >> 4: (Finisher::finisher_thread_entry()+0x190) [0x8bd6d0] >> 5: (()+0x8192) [0x7f31d9c8f192] >> 6: (clone()+0x6d) [0x7f31d919c51d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to >> interpret this. >> >> -106> 2017-02-27 16:56:08.233680 7f31daa3b7c0 -1 mds.-1.0 log_to_monitors >> {default=true} >> -1> 2017-02-27 16:56:08.537714 7f31d30df700 -1 mds.0.sessionmap _load_finish >> got (2) No such file or directory >> 0> 2017-02-27 16:56:08.538493 7f31d30df700 -1 mds/SessionMap.cc >> <http://sessionmap.cc/>: In function 'void SessionMap::_load_finish(int, >> ceph::bufferlist&)' thread 7f31d30df700 time 2017-02-27 16:56:08.537739 >> mds/SessionMap.cc <http://sessionmap.cc/>: 98: FAILED assert(0 == "failed to >> load sessionmap") >> >> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x8b) [0x98bb4b] >> 2: (SessionMap::_load_finish(int, ceph::buffer::list&)+0x2b4) [0x7df2a4] >> 3: (MDSIOContextBase::complete(int)+0x95) [0x7e34b5] >> 4: (Finisher::finisher_thread_entry()+0x190) [0x8bd6d0] >> 5: (()+0x8192) [0x7f31d9c8f192] >> 6: (clone()+0x6d) [0x7f31d919c51d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to >> interpret this. >> >> terminate called after throwing an instance of 'ceph::FailedAssertion' >> *** Caught signal (Aborted) ** >> in thread 7f31d30df700 >> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) >> 1: ceph_mds() [0x89984a] >> 2: (()+0x10350) [0x7f31d9c97350] >> 3: (gsignal()+0x39) [0x7f31d90d8c49] >> 4: (abort()+0x148) [0x7f31d90dc058] >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f31d99e3555] >> 6: (()+0x5e6f6) [0x7f31d99e16f6] >> 7: (()+0x5e723) [0x7f31d99e1723] >> 8: (()+0x5e942) [0x7f31d99e1942] >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x278) [0x98bd38] >> 10: (SessionMap::_load_finish(int, ceph::buffer::list&)+0x2b4) [0x7df2a4] >> 11: (MDSIOContextBase::complete(int)+0x95) [0x7e34b5] >> 12: (Finisher::finisher_thread_entry()+0x190) [0x8bd6d0] >> 13: (()+0x8192) [0x7f31d9c8f192] >> 14: (clone()+0x6d) [0x7f31d919c51d] >> 2017-02-27 16:56:08.540155 7f31d30df700 -1 *** Caught signal (Aborted) ** >> in thread 7f31d30df700 >> >> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) >> 1: ceph_mds() [0x89984a] >> 2: (()+0x10350) [0x7f31d9c97350] >> 3: (gsignal()+0x39) [0x7f31d90d8c49] >> 4: (abort()+0x148) [0x7f31d90dc058] >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f31d99e3555] >> 6: (()+0x5e6f6) [0x7f31d99e16f6] >> 7: (()+0x5e723) [0x7f31d99e1723] >> 8: (()+0x5e942) [0x7f31d99e1942] >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x278) [0x98bd38] >> 10: (SessionMap::_load_finish(int, ceph::buffer::list&)+0x2b4) [0x7df2a4] >> 11: (MDSIOContextBase::complete(int)+0x95) [0x7e34b5] >> 12: (Finisher::finisher_thread_entry()+0x190) [0x8bd6d0] >> 13: (()+0x8192) [0x7f31d9c8f192] >> 14: (clone()+0x6d) [0x7f31d919c51d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to >> interpret this. >> >> 0> 2017-02-27 16:56:08.540155 7f31d30df700 -1 *** Caught signal (Aborted) ** >> in thread 7f31d30df700 >> >> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) >> 1: ceph_mds() [0x89984a] >> 2: (()+0x10350) [0x7f31d9c97350] >> 3: (gsignal()+0x39) [0x7f31d90d8c49] >> 4: (abort()+0x148) [0x7f31d90dc058] >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f31d99e3555] >> 6: (()+0x5e6f6) [0x7f31d99e16f6] >> 7: (()+0x5e723) [0x7f31d99e1723] >> 8: (()+0x5e942) [0x7f31d99e1942] >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x278) [0x98bd38] >> 10: (SessionMap::_load_finish(int, ceph::buffer::list&)+0x2b4) [0x7df2a4] >> 11: (MDSIOContextBase::complete(int)+0x95) [0x7e34b5] >> 12: (Finisher::finisher_thread_entry()+0x190) [0x8bd6d0] >> 13: (()+0x8192) [0x7f31d9c8f192] >> 14: (clone()+0x6d) [0x7f31d919c51d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to >> interpret this. >> >> How can I safely stop a Ceph cluster, so that it will cleanly start back up >> again? >> >> -Chris > > -- > Klinik-Service Neubrandenburg GmbH > Allendestr. 30, 17036 Neubrandenburg > Amtsgericht Neubrandenburg, HRB 2457 > Geschaeftsfuehrerin: Gudrun Kappich > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com