Hi,
thanks, that worked. I deployed the first MGR manually and the others
using the orchestrator.
Thank you so much.
Daniel
Am 27.07.22 um 18:23 schrieb Adam King:
yeah, that works if there is a working mgr to send the command to. I was
assuming here all the mgr daemons were down since it was a fresh cluster
so all the mgrs would have this bugged image.
On Wed, Jul 27, 2022 at 12:07 PM Vikhyat Umrao <vikhyat@xxxxxxxxxx
<mailto:vikhyat@xxxxxxxxxx>> wrote:
Adam - or we could simply redeploy the daemon with the new image? at
least this is something I did in our testing here[1].
|ceph orch daemon redeploy mgr.<name>
quay.ceph.io/ceph-ci/ceph:f516549e3e4815795ff0343ab71b3ebf567e5531
<http://quay.ceph.io/ceph-ci/ceph:f516549e3e4815795ff0343ab71b3ebf567e5531>|
[1] https://github.com/ceph/ceph/pull/47270#issuecomment-1196062363
<https://github.com/ceph/ceph/pull/47270#issuecomment-1196062363>
On Wed, Jul 27, 2022 at 8:55 AM Adam King <adking@xxxxxxxxxx
<mailto:adking@xxxxxxxxxx>> wrote:
the unit.image file is just there for cpehadm to look at as part of
gathering metadata I think. What you'd want to edit is the
unit.run file
(in the same directory as the unit.image). It should have a
really long
line specifying a podman/docker run command and somewhere in
there will be
"CONTAINER_IMAGE=<old-image-name>". You'd need to change that to say
"CONTAINER_IMAGE=
quay.ceph.io/ceph-ci/ceph:f516549e3e4815795ff0343ab71b3ebf567e5531
<http://quay.ceph.io/ceph-ci/ceph:f516549e3e4815795ff0343ab71b3ebf567e5531>"
then
restart the service.
On Wed, Jul 27, 2022 at 11:46 AM Daniel Schreiber <
daniel.schreiber@xxxxxxxxxxxxxxxxxx
<mailto:daniel.schreiber@xxxxxxxxxxxxxxxxxx>> wrote:
> Hi Neha,
>
> thanks for the quick response. Sorry for that stupid
question: to use
> that image I pull the image on the machine and then change
> /var/lib/ceph/${clusterid}/mgr.${unit}/unit.image and start
the service?
>
> Thanks,
>
> Daniel
>
> Am 27.07.22 um 17:23 schrieb Neha Ojha:
> > Hi Daniel,
> >
> > This issue seems to be showing up in 17.2.2, details in
> > https://tracker.ceph.com/issues/55304
<https://tracker.ceph.com/issues/55304>. We are currently in the
process
> > of validating the fix
https://github.com/ceph/ceph/pull/47270
<https://github.com/ceph/ceph/pull/47270> and
> > we'll try to expedite a quick fix.
> >
> > In the meantime, we have builds/images of the dev version
of the fix,
> > in case you want to give it a try.
> >
https://shaman.ceph.com/builds/ceph/wip-quincy-libcephsqlite-fix/ <https://shaman.ceph.com/builds/ceph/wip-quincy-libcephsqlite-fix/>
> >
quay.ceph.io/ceph-ci/ceph:f516549e3e4815795ff0343ab71b3ebf567e5531
<http://quay.ceph.io/ceph-ci/ceph:f516549e3e4815795ff0343ab71b3ebf567e5531>
> >
> > Thanks,
> > Neha
> >
> >
> >
> > On Wed, Jul 27, 2022 at 8:10 AM Daniel Schreiber
> > <daniel.schreiber@xxxxxxxxxxxxxxxxxx
<mailto:daniel.schreiber@xxxxxxxxxxxxxxxxxx>> wrote:
> >>
> >> Hi,
> >>
> >> I installed a fresh cluster using cephadm:
> >>
> >> - bootstrapped one node
> >> - extended it using to 3 monitor nodes, each running mon +
mgr using a
> >> spec file
> >> - added 12 OSDs hosts to the spec file with the following
disk rules:
> >>
> >> ~~~
> >> service_type: osd
> >> service_id: osd_spec_hdd
> >> placement:
> >> label: osd
> >> spec:
> >> data_devices:
> >> model: "HGST HUH721212AL" # HDDs
> >> db_devices:
> >> model: "SAMSUNG MZ7KH1T9" # SATA SSDs
> >>
> >> ---
> >>
> >> service_type: osd
> >> service_id: osd_spec_nvme
> >> placement:
> >> label: osd
> >> spec:
> >> data_devices:
> >> model: "SAMSUNG MZPLL1T6HAJQ-00005" # NVMEs
> >> ~~~
> >>
> >> OSDs on HDD + SSD were deployed, NVME OSDs were not.
> >>
> >> MGRs crashed, one after the other:
> >>
> >> debug -65> 2022-07-25T17:06:36.507+0000 7f4a33f80700 5
cephsqlite:
> >> FullPathname: (client.17139) 1: /.mgr:devicehealth/main.db
> >> debug -64> 2022-07-25T17:06:36.507+0000 7f4a34f82700 0
[dashboard
> >> INFO sso] Loading SSO DB version=1
> >> debug -63> 2022-07-25T17:06:36.507+0000 7f4a34f82700 4
mgr get_store
> >> get_store key: mgr/dashboard/ssodb_v1
> >> debug -62> 2022-07-25T17:06:36.507+0000 7f4a34f82700 4
> >> ceph_store_get ssodb_v1 not found
> >> debug -61> 2022-07-25T17:06:36.507+0000 7f4a34f82700 0
[dashboard
> >> INFO root] server: ssl=no host=:: port=8080
> >> debug -60> 2022-07-25T17:06:36.507+0000 7f4a34f82700 0
[dashboard
> >> INFO root] Configured CherryPy, starting engine...
> >> debug -59> 2022-07-25T17:06:36.507+0000 7f4a34f82700 4
mgr set_uri
> >> module dashboard set URI 'http://192.168.14.201:8080/
<http://192.168.14.201:8080/>'
> >> debug -58> 2022-07-25T17:06:36.511+0000 7f4a64e91700 4
> >> ceph_store_get active_devices not found
> >> debug -57> 2022-07-25T17:06:36.511+0000 7f4a33f80700 -1
*** Caught
> >> signal (Aborted) **
> >> in thread 7f4a33f80700 thread_name:devicehealth
> >> ceph version 17.2.2
(b6e46b8939c67a6cc754abb4d0ece3c8918eccc3) quincy
> >> (stable)
> >> 1: /lib64/libpthread.so.0(+0x12ce0) [0x7f4a9b0d0ce0]
> >> 2: gsignal()
> >> 3: abort()
> >> 4: /lib64/libstdc++.so.6(+0x9009b) [0x7f4a9a4cf09b]
> >> 5: /lib64/libstdc++.so.6(+0x9653c) [0x7f4a9a4d553c]
> >> 6: /lib64/libstdc++.so.6(+0x96597) [0x7f4a9a4d5597]
> >> 7: /lib64/libstdc++.so.6(+0x967f8) [0x7f4a9a4d57f8]
> >> 8:
(std::__throw_regex_error(std::regex_constants::error_type, char
> >> const*)+0x4a) [0x5607b31d5eea]
> >> 9: (bool
std::__detail::_Compiler<std::__cxx11::regex_traits<char>
> >> >::_M_expression_term<false,
> >> false>(std::__detail::_Compiler<std::__cxx11::regex>
> >> 10: (void
std::__detail::_Compiler<std::__cxx11::regex_traits<char>
> >> >::_M_insert_bracket_matcher<false, false>(bool)+0x146)
> [0x5607b31e26b6]
> >> 11:
(std::__detail::_Compiler<std::__cxx11::regex_traits<char>
> >> >::_M_bracket_expression()+0x6b) [0x5607b31e663b]
> >> 12:
(std::__detail::_Compiler<std::__cxx11::regex_traits<char>
> >> >::_M_atom()+0x6a) [0x5607b31e671a]
> >> 13:
(std::__detail::_Compiler<std::__cxx11::regex_traits<char>
> >> >::_M_alternative()+0xd0) [0x5607b31e6ca0]
> >> 14:
(std::__detail::_Compiler<std::__cxx11::regex_traits<char>
> >> >::_M_disjunction()+0x30) [0x5607b31e6df0]
> >> 15:
(std::__detail::_Compiler<std::__cxx11::regex_traits<char>
> >> >::_M_atom()+0x338) [0x5607b31e69e8]
> >> 16:
(std::__detail::_Compiler<std::__cxx11::regex_traits<char>
> >> >::_M_alternative()+0xd0) [0x5607b31e6ca0]
> >> 17:
(std::__detail::_Compiler<std::__cxx11::regex_traits<char>
> >> >::_M_alternative()+0x42) [0x5607b31e6c12]
> >> 18:
(std::__detail::_Compiler<std::__cxx11::regex_traits<char>
> >> >::_M_alternative()+0x42) [0x5607b31e6c12]
> >> 19:
(std::__detail::_Compiler<std::__cxx11::regex_traits<char>
> >> >::_M_alternative()+0x42) [0x5607b31e6c12]
> >> 20:
(std::__detail::_Compiler<std::__cxx11::regex_traits<char>
> >> >::_M_alternative()+0x42) [0x5607b31e6c12]
> >> 21:
(std::__detail::_Compiler<std::__cxx11::regex_traits<char>
> >> >::_M_disjunction()+0x30) [0x5607b31e6df0]
> >> 22:
(std::__detail::_Compiler<std::__cxx11::regex_traits<char>
> >> >::_Compiler(char const*, char const*, std::locale const&,
> >> std::regex_constants::syn>
> >> 23: /lib64/libcephsqlite.so(+0x1b7ca) [0x7f4a9d8ba7ca]
> >> 24: /lib64/libcephsqlite.so(+0x24486) [0x7f4a9d8c3486]
> >> 25: /lib64/libsqlite3.so.0(+0x75f1c) [0x7f4a9d600f1c]
> >> 26: /lib64/libsqlite3.so.0(+0xdd4c9) [0x7f4a9d6684c9]
> >> 27: pysqlite_connection_init()
> >> 28: /lib64/libpython3.6m.so.1.0(+0x13afc6) [0x7f4a9d182fc6]
> >> 29: PyObject_Call()
> >> 30:
> >>
/lib64/python3.6/lib-dynload/_sqlite3.cpython-36m-x86_64-linux-gnu.so
<http://sqlite3.cpython-36m-x86_64-linux-gnu.so>
> (+0xa1f5)
> >> [0x7f4a8bdf31f5]
> >> 31: /lib64/libpython3.6m.so.1.0(+0x19d5f1) [0x7f4a9d1e55f1]
> >> NOTE: a copy of the executable, or `objdump -rdS
<executable>` is
> >> needed to interpret this.
> >>
> >> Is there anything I can do to recover from this? Is there
anything I can
> >> add to help debugging this?
> >>
> >> Thank you,
> >>
> >> Daniel
> >> --
> >> Daniel Schreiber
> >> Facharbeitsgruppe Systemsoftware
> >> Universitaetsrechenzentrum
> >>
> >> Technische Universität Chemnitz
> >> Straße der Nationen 62 (Raum B303)
> >> 09111 Chemnitz
> >> Germany
> >>
> >> Tel: +49 371 531 35444
> >> Fax: +49 371 531 835444
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx>
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
> >
>
> --
> Daniel Schreiber
> Facharbeitsgruppe Systemsoftware
> Universitaetsrechenzentrum
>
> Technische Universität Chemnitz
> Straße der Nationen 62 (Raum B303)
> 09111 Chemnitz
> Germany
>
> Tel: +49 371 531 35444
> Fax: +49 371 531 835444
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
--
Daniel Schreiber
Facharbeitsgruppe Systemsoftware
Universitaetsrechenzentrum
Technische Universität Chemnitz
Straße der Nationen 62 (Raum B303)
09111 Chemnitz
Germany
Tel: +49 371 531 35444
Fax: +49 371 531 835444
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx