I just build a Ceph cluster and was, unfortunately hit by this :( I managed to restart the mgrs (2 of them) by manually editing the /var/run/ceph/<cluster>/mgr.<name>/unit.run. But now I have a problem that I really don't understand: - both managers are running, and appear on "ceph -s" as "mgr: cephadm.mxrhsp(active, since 62m), standbys: ceph01.fwtity" - looks like the orchestrator is a little "confused": # ceph orch ps --daemon-type mgr NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID mgr.ceph01.fwtity ceph01 *:8443,9283 error 62m ago 2h - - <unknown> <unknown> <unknown> mgr.cephadm.mxrhsp cephadm *:9283 running (63m) 62m ago 2h 437M - 17.2.2-1- gf516549e 5081f5a97849 0f0bc2c6791f because of this I can't run a "ceph orch upgrade" because it always complains about having only one. Is there something else that needs to be changed to get the cluster to a normal state? Thanks! On Wed, 2022-07-27 at 12:23 -0400, Adam King wrote: > yeah, that works if there is a working mgr to send the command to. I was > assuming here all the mgr daemons were down since it was a fresh cluster so > all the mgrs would have this bugged image. > > On Wed, Jul 27, 2022 at 12:07 PM Vikhyat Umrao <vikhyat@xxxxxxxxxx> wrote: > > > Adam - or we could simply redeploy the daemon with the new image? at least > > this is something I did in our testing here[1]. > > > > ceph orch daemon redeploy mgr.<name> quay.ceph.io/ceph- > > ci/ceph:f516549e3e4815795ff0343ab71b3ebf567e5531 > > > > [1] https://github.com/ceph/ceph/pull/47270#issuecomment-1196062363 > > > > On Wed, Jul 27, 2022 at 8:55 AM Adam King <adking@xxxxxxxxxx> wrote: > > > > > the unit.image file is just there for cpehadm to look at as part of > > > gathering metadata I think. What you'd want to edit is the unit.run file > > > (in the same directory as the unit.image). It should have a really long > > > line specifying a podman/docker run command and somewhere in there will be > > > "CONTAINER_IMAGE=<old-image-name>". You'd need to change that to say > > > "CONTAINER_IMAGE= > > > quay.ceph.io/ceph-ci/ceph:f516549e3e4815795ff0343ab71b3ebf567e5531" then > > > restart the service. > > > > > > On Wed, Jul 27, 2022 at 11:46 AM Daniel Schreiber < > > > daniel.schreiber@xxxxxxxxxxxxxxxxxx> wrote: > > > > > > > Hi Neha, > > > > > > > > thanks for the quick response. Sorry for that stupid question: to use > > > > that image I pull the image on the machine and then change > > > > /var/lib/ceph/${clusterid}/mgr.${unit}/unit.image and start the service? > > > > > > > > Thanks, > > > > > > > > Daniel > > > > > > > > Am 27.07.22 um 17:23 schrieb Neha Ojha: > > > > > Hi Daniel, > > > > > > > > > > This issue seems to be showing up in 17.2.2, details in > > > > > https://tracker.ceph.com/issues/55304. We are currently in the > > > process > > > > > of validating the fix https://github.com/ceph/ceph/pull/47270 and > > > > > we'll try to expedite a quick fix. > > > > > > > > > > In the meantime, we have builds/images of the dev version of the fix, > > > > > in case you want to give it a try. > > > > > https://shaman.ceph.com/builds/ceph/wip-quincy-libcephsqlite-fix/ > > > > > quay.ceph.io/ceph-ci/ceph:f516549e3e4815795ff0343ab71b3ebf567e5531 > > > > > > > > > > Thanks, > > > > > Neha > > > > > > > > > > > > > > > > > > > > On Wed, Jul 27, 2022 at 8:10 AM Daniel Schreiber > > > > > <daniel.schreiber@xxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > I installed a fresh cluster using cephadm: > > > > > > > > > > > > - bootstrapped one node > > > > > > - extended it using to 3 monitor nodes, each running mon + mgr using > > > a > > > > > > spec file > > > > > > - added 12 OSDs hosts to the spec file with the following disk rules: > > > > > > > > > > > > ~~~ > > > > > > service_type: osd > > > > > > service_id: osd_spec_hdd > > > > > > placement: > > > > > > label: osd > > > > > > spec: > > > > > > data_devices: > > > > > > model: "HGST HUH721212AL" # HDDs > > > > > > db_devices: > > > > > > model: "SAMSUNG MZ7KH1T9" # SATA SSDs > > > > > > > > > > > > --- > > > > > > > > > > > > service_type: osd > > > > > > service_id: osd_spec_nvme > > > > > > placement: > > > > > > label: osd > > > > > > spec: > > > > > > data_devices: > > > > > > model: "SAMSUNG MZPLL1T6HAJQ-00005" # NVMEs > > > > > > ~~~ > > > > > > > > > > > > OSDs on HDD + SSD were deployed, NVME OSDs were not. > > > > > > > > > > > > MGRs crashed, one after the other: > > > > > > > > > > > > debug -65> 2022-07-25T17:06:36.507+0000 7f4a33f80700 5 > > > cephsqlite: > > > > > > FullPathname: (client.17139) 1: /.mgr:devicehealth/main.db > > > > > > debug -64> 2022-07-25T17:06:36.507+0000 7f4a34f82700 0 [dashboard > > > > > > INFO sso] Loading SSO DB version=1 > > > > > > debug -63> 2022-07-25T17:06:36.507+0000 7f4a34f82700 4 mgr > > > get_store > > > > > > get_store key: mgr/dashboard/ssodb_v1 > > > > > > debug -62> 2022-07-25T17:06:36.507+0000 7f4a34f82700 4 > > > > > > ceph_store_get ssodb_v1 not found > > > > > > debug -61> 2022-07-25T17:06:36.507+0000 7f4a34f82700 0 [dashboard > > > > > > INFO root] server: ssl=no host=:: port=8080 > > > > > > debug -60> 2022-07-25T17:06:36.507+0000 7f4a34f82700 0 [dashboard > > > > > > INFO root] Configured CherryPy, starting engine... > > > > > > debug -59> 2022-07-25T17:06:36.507+0000 7f4a34f82700 4 mgr > > > set_uri > > > > > > module dashboard set URI 'http://192.168.14.201:8080/' > > > > > > debug -58> 2022-07-25T17:06:36.511+0000 7f4a64e91700 4 > > > > > > ceph_store_get active_devices not found > > > > > > debug -57> 2022-07-25T17:06:36.511+0000 7f4a33f80700 -1 *** Caught > > > > > > signal (Aborted) ** > > > > > > in thread 7f4a33f80700 thread_name:devicehealth > > > > > > ceph version 17.2.2 (b6e46b8939c67a6cc754abb4d0ece3c8918eccc3) > > > quincy > > > > > > (stable) > > > > > > 1: /lib64/libpthread.so.0(+0x12ce0) [0x7f4a9b0d0ce0] > > > > > > 2: gsignal() > > > > > > 3: abort() > > > > > > 4: /lib64/libstdc++.so.6(+0x9009b) [0x7f4a9a4cf09b] > > > > > > 5: /lib64/libstdc++.so.6(+0x9653c) [0x7f4a9a4d553c] > > > > > > 6: /lib64/libstdc++.so.6(+0x96597) [0x7f4a9a4d5597] > > > > > > 7: /lib64/libstdc++.so.6(+0x967f8) [0x7f4a9a4d57f8] > > > > > > 8: (std::__throw_regex_error(std::regex_constants::error_type, > > > char > > > > > > const*)+0x4a) [0x5607b31d5eea] > > > > > > 9: (bool std::__detail::_Compiler<std::__cxx11::regex_traits<char> > > > > > > >::_M_expression_term<false, > > > > > > false>(std::__detail::_Compiler<std::__cxx11::regex> > > > > > > 10: (void > > > std::__detail::_Compiler<std::__cxx11::regex_traits<char> > > > > > > >::_M_insert_bracket_matcher<false, false>(bool)+0x146) > > > > [0x5607b31e26b6] > > > > > > 11: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> > > > > > > >::_M_bracket_expression()+0x6b) [0x5607b31e663b] > > > > > > 12: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> > > > > > > >::_M_atom()+0x6a) [0x5607b31e671a] > > > > > > 13: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> > > > > > > >::_M_alternative()+0xd0) [0x5607b31e6ca0] > > > > > > 14: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> > > > > > > >::_M_disjunction()+0x30) [0x5607b31e6df0] > > > > > > 15: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> > > > > > > >::_M_atom()+0x338) [0x5607b31e69e8] > > > > > > 16: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> > > > > > > >::_M_alternative()+0xd0) [0x5607b31e6ca0] > > > > > > 17: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> > > > > > > >::_M_alternative()+0x42) [0x5607b31e6c12] > > > > > > 18: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> > > > > > > >::_M_alternative()+0x42) [0x5607b31e6c12] > > > > > > 19: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> > > > > > > >::_M_alternative()+0x42) [0x5607b31e6c12] > > > > > > 20: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> > > > > > > >::_M_alternative()+0x42) [0x5607b31e6c12] > > > > > > 21: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> > > > > > > >::_M_disjunction()+0x30) [0x5607b31e6df0] > > > > > > 22: (std::__detail::_Compiler<std::__cxx11::regex_traits<char> > > > > > > >::_Compiler(char const*, char const*, std::locale const&, > > > > > > std::regex_constants::syn> > > > > > > 23: /lib64/libcephsqlite.so(+0x1b7ca) [0x7f4a9d8ba7ca] > > > > > > 24: /lib64/libcephsqlite.so(+0x24486) [0x7f4a9d8c3486] > > > > > > 25: /lib64/libsqlite3.so.0(+0x75f1c) [0x7f4a9d600f1c] > > > > > > 26: /lib64/libsqlite3.so.0(+0xdd4c9) [0x7f4a9d6684c9] > > > > > > 27: pysqlite_connection_init() > > > > > > 28: /lib64/libpython3.6m.so.1.0(+0x13afc6) [0x7f4a9d182fc6] > > > > > > 29: PyObject_Call() > > > > > > 30: > > > > > > /lib64/python3.6/lib-dynload/_ > > > sqlite3.cpython-36m-x86_64-linux-gnu.so > > > > (+0xa1f5) > > > > > > [0x7f4a8bdf31f5] > > > > > > 31: /lib64/libpython3.6m.so.1.0(+0x19d5f1) [0x7f4a9d1e55f1] > > > > > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > > > > > > needed to interpret this. > > > > > > > > > > > > Is there anything I can do to recover from this? Is there anything I > > > can > > > > > > add to help debugging this? > > > > > > > > > > > > Thank you, > > > > > > > > > > > > Daniel > > > > > > -- > > > > > > Daniel Schreiber > > > > > > Facharbeitsgruppe Systemsoftware > > > > > > Universitaetsrechenzentrum > > > > > > > > > > > > Technische Universität Chemnitz > > > > > > Straße der Nationen 62 (Raum B303) > > > > > > 09111 Chemnitz > > > > > > Germany > > > > > > > > > > > > Tel: +49 371 531 35444 > > > > > > Fax: +49 371 531 835444 > > > > > > _______________________________________________ > > > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > > > > > > > -- > > > > Daniel Schreiber > > > > Facharbeitsgruppe Systemsoftware > > > > Universitaetsrechenzentrum > > > > > > > > Technische Universität Chemnitz > > > > Straße der Nationen 62 (Raum B303) > > > > 09111 Chemnitz > > > > Germany > > > > > > > > Tel: +49 371 531 35444 > > > > Fax: +49 371 531 835444 > > > > _______________________________________________ > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx