Re: 17.2.2: all MGRs crashing in fresh cephadm install

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Daniel,

This issue seems to be showing up in 17.2.2, details in
https://tracker.ceph.com/issues/55304. We are currently in the process
of validating the fix https://github.com/ceph/ceph/pull/47270 and
we'll try to expedite a quick fix.

In the meantime, we have builds/images of the dev version of the fix,
in case you want to give it a try.
https://shaman.ceph.com/builds/ceph/wip-quincy-libcephsqlite-fix/
quay.ceph.io/ceph-ci/ceph:f516549e3e4815795ff0343ab71b3ebf567e5531

Thanks,
Neha



On Wed, Jul 27, 2022 at 8:10 AM Daniel Schreiber
<daniel.schreiber@xxxxxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> I installed a fresh cluster using cephadm:
>
> - bootstrapped one node
> - extended it using to 3 monitor nodes, each running mon + mgr using a
> spec file
> - added 12 OSDs hosts to the spec file with the following disk rules:
>
> ~~~
> service_type: osd
> service_id: osd_spec_hdd
> placement:
>    label: osd
> spec:
>    data_devices:
>      model: "HGST HUH721212AL" # HDDs
>    db_devices:
>      model: "SAMSUNG MZ7KH1T9" # SATA SSDs
>
> ---
>
> service_type: osd
> service_id: osd_spec_nvme
> placement:
>    label: osd
> spec:
>    data_devices:
>      model: "SAMSUNG MZPLL1T6HAJQ-00005" # NVMEs
> ~~~
>
> OSDs on HDD + SSD were deployed, NVME OSDs were not.
>
> MGRs crashed, one after the other:
>
> debug    -65> 2022-07-25T17:06:36.507+0000 7f4a33f80700  5 cephsqlite:
> FullPathname: (client.17139) 1: /.mgr:devicehealth/main.db
> debug    -64> 2022-07-25T17:06:36.507+0000 7f4a34f82700  0 [dashboard
> INFO sso] Loading SSO DB version=1
> debug    -63> 2022-07-25T17:06:36.507+0000 7f4a34f82700  4 mgr get_store
> get_store key: mgr/dashboard/ssodb_v1
> debug    -62> 2022-07-25T17:06:36.507+0000 7f4a34f82700  4
> ceph_store_get ssodb_v1 not found
> debug    -61> 2022-07-25T17:06:36.507+0000 7f4a34f82700  0 [dashboard
> INFO root] server: ssl=no host=:: port=8080
> debug    -60> 2022-07-25T17:06:36.507+0000 7f4a34f82700  0 [dashboard
> INFO root] Configured CherryPy, starting engine...
> debug    -59> 2022-07-25T17:06:36.507+0000 7f4a34f82700  4 mgr set_uri
> module dashboard set URI 'http://192.168.14.201:8080/'
> debug    -58> 2022-07-25T17:06:36.511+0000 7f4a64e91700  4
> ceph_store_get active_devices not found
> debug    -57> 2022-07-25T17:06:36.511+0000 7f4a33f80700 -1 *** Caught
> signal (Aborted) **
>   in thread 7f4a33f80700 thread_name:devicehealth
>   ceph version 17.2.2 (b6e46b8939c67a6cc754abb4d0ece3c8918eccc3) quincy
> (stable)
>   1: /lib64/libpthread.so.0(+0x12ce0) [0x7f4a9b0d0ce0]
>   2: gsignal()
>   3: abort()
>   4: /lib64/libstdc++.so.6(+0x9009b) [0x7f4a9a4cf09b]
>   5: /lib64/libstdc++.so.6(+0x9653c) [0x7f4a9a4d553c]
>   6: /lib64/libstdc++.so.6(+0x96597) [0x7f4a9a4d5597]
>   7: /lib64/libstdc++.so.6(+0x967f8) [0x7f4a9a4d57f8]
>   8: (std::__throw_regex_error(std::regex_constants::error_type, char
> const*)+0x4a) [0x5607b31d5eea]
>   9: (bool std::__detail::_Compiler<std::__cxx11::regex_traits<char>
>  >::_M_expression_term<false,
> false>(std::__detail::_Compiler<std::__cxx11::regex>
>   10: (void std::__detail::_Compiler<std::__cxx11::regex_traits<char>
>  >::_M_insert_bracket_matcher<false, false>(bool)+0x146) [0x5607b31e26b6]
>   11: (std::__detail::_Compiler<std::__cxx11::regex_traits<char>
>  >::_M_bracket_expression()+0x6b) [0x5607b31e663b]
>   12: (std::__detail::_Compiler<std::__cxx11::regex_traits<char>
>  >::_M_atom()+0x6a) [0x5607b31e671a]
>   13: (std::__detail::_Compiler<std::__cxx11::regex_traits<char>
>  >::_M_alternative()+0xd0) [0x5607b31e6ca0]
>   14: (std::__detail::_Compiler<std::__cxx11::regex_traits<char>
>  >::_M_disjunction()+0x30) [0x5607b31e6df0]
>   15: (std::__detail::_Compiler<std::__cxx11::regex_traits<char>
>  >::_M_atom()+0x338) [0x5607b31e69e8]
>   16: (std::__detail::_Compiler<std::__cxx11::regex_traits<char>
>  >::_M_alternative()+0xd0) [0x5607b31e6ca0]
>   17: (std::__detail::_Compiler<std::__cxx11::regex_traits<char>
>  >::_M_alternative()+0x42) [0x5607b31e6c12]
>   18: (std::__detail::_Compiler<std::__cxx11::regex_traits<char>
>  >::_M_alternative()+0x42) [0x5607b31e6c12]
>   19: (std::__detail::_Compiler<std::__cxx11::regex_traits<char>
>  >::_M_alternative()+0x42) [0x5607b31e6c12]
>   20: (std::__detail::_Compiler<std::__cxx11::regex_traits<char>
>  >::_M_alternative()+0x42) [0x5607b31e6c12]
>   21: (std::__detail::_Compiler<std::__cxx11::regex_traits<char>
>  >::_M_disjunction()+0x30) [0x5607b31e6df0]
>   22: (std::__detail::_Compiler<std::__cxx11::regex_traits<char>
>  >::_Compiler(char const*, char const*, std::locale const&,
> std::regex_constants::syn>
>   23: /lib64/libcephsqlite.so(+0x1b7ca) [0x7f4a9d8ba7ca]
>   24: /lib64/libcephsqlite.so(+0x24486) [0x7f4a9d8c3486]
>   25: /lib64/libsqlite3.so.0(+0x75f1c) [0x7f4a9d600f1c]
>   26: /lib64/libsqlite3.so.0(+0xdd4c9) [0x7f4a9d6684c9]
>   27: pysqlite_connection_init()
>   28: /lib64/libpython3.6m.so.1.0(+0x13afc6) [0x7f4a9d182fc6]
>   29: PyObject_Call()
>   30:
> /lib64/python3.6/lib-dynload/_sqlite3.cpython-36m-x86_64-linux-gnu.so(+0xa1f5)
> [0x7f4a8bdf31f5]
>   31: /lib64/libpython3.6m.so.1.0(+0x19d5f1) [0x7f4a9d1e55f1]
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> Is there anything I can do to recover from this? Is there anything I can
> add to help debugging this?
>
> Thank you,
>
> Daniel
> --
> Daniel Schreiber
> Facharbeitsgruppe Systemsoftware
> Universitaetsrechenzentrum
>
> Technische Universität Chemnitz
> Straße der Nationen 62 (Raum B303)
> 09111 Chemnitz
> Germany
>
> Tel:     +49 371 531 35444
> Fax:     +49 371 531 835444
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux