Let us know how you go with the older kernel that is known to work. On Tue, Aug 17, 2021 at 12:31 AM Links 2004 <links2004.code@xxxxxxxxx> wrote: > > Only one of the servers is failing of the 4 that have access to the cluster. > since we where in the mid of an update the kernel and os versions are bit all over the place, but here we go: > > bad: > Debian 10 (buster) - 5.10.0-0.bpo.8-amd64 #1 SMP Debian 5.10.46-2~bpo10+1 (2021-07-22) x86_64 GNU/Linux > > good: > Debian 10 (buster) - 5.10.0-0.bpo.4-amd64 #1 SMP Debian 5.10.19-1~bpo10+1 (2021-03-13) x86_64 GNU/Linux > Debian 11 (bullseye) - 5.11.22-3-pve #1 SMP PVE 5.11.22-6 (Wed, 28 Jul 2021 10:51:12 +0200) x86_64 GNU/Linux > Debian 11 (bullseye) - 5.10.0-8-amd64 #1 SMP Debian 5.10.46-3 (2021-07-28) x86_64 GNU/Linux > > the proxmox one is at: > ceph version 16.2.5 (9b9dd76e12f1907fe5dcc0c1fadadbb784022a42) pacific (stable) > > the rest is at: > ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable) > > all 3 other servers are running all ceph commands fine. > and 2 of them have mon running fine too. > the 3 mon where on the bad one which is not starting with the same error. > > we stopped the bullseye update since we may kill a second server which will be bad for the cluster ;) > > as soon its as possible we will try going back to 5.10.0-0.bpo.4-amd64 on the bad one, > since this is known to work on one of the other servers. > > the hardware of the server can not really be compared since this is a testing / dev cluster and composed of old hardware. > And we tried the update there first to not kill the production environment. > > > Am So., 15. Aug. 2021 um 00:13 Uhr schrieb Brad Hubbard <bhubbard@xxxxxxxxxx>: >> >> On Sat, Aug 14, 2021 at 7:13 PM Links 2004 <links2004.code@xxxxxxxxx> wrote: >> > >> > no kernel update in the last time, and its a server so no keyboard etc, but the entropy_avail looks good and its in the same range (-+80) as the other server. >> > dmesg | grep random has no results. >> > # cat /proc/sys/kernel/random/entropy_avail >> > 3547 >> > >> > all servers run >> > ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable) >> >> Can you give us some more information? >> >> Do all of your servers fail all of the time or do some fail some of the time? >> >> What do the ones that fail have in common and how do they differ from >> the ones that do not fail? >> >> If you have the option to try alternative kernels on the servers that >> exhibit this behavior I'd suggest that might be a good next step. >> >> > >> > >> > >> > >> > Am Sa., 14. Aug. 2021 um 10:55 Uhr schrieb kefu chai <tchaikov@xxxxxxxxx>: >> >> >> >> >> >> >> >> Brad Hubbard <bhubbard@xxxxxxxxxx>于2021年8月14日 周六06:11写道: >> >>> >> >>> On Sat, Aug 14, 2021 at 4:06 AM Links 2004 <links2004.code@xxxxxxxxx> wrote: >> >>> > >> >>> > >> >>> > Hi, >> >>> > >> >>> > we are currently facing a strange problem on on of our ceph nodes. >> >>> > it is not possible to call `ceph -s` or start a mgr with out a 'std::runtime_error'. >> >>> > >> >>> > find below the error message and a gdb backtrace with debug symbols. >> >>> > hope this helps to understand the problem and point us in the correct direction. >> >>> > >> >>> > Thanks >> >>> > >> >>> > Markus >> >>> > >> >>> > >> >>> > OS: Debian buster >> >>> > kernel: 5.10.0-0.bpo.8-amd64 #1 SMP Debian 5.10.46-2~bpo10+1 (2021-07-22) x86_64 GNU/Linux >> >>> > >> >>> > ``` >> >>> > # ceph -v >> >>> > ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable) >> >>> > ``` >> >>> > >> >>> > ``` >> >>> > # ceph -s >> >>> > terminate called after throwing an instance of 'std::runtime_error' >> >>> > what(): random_device::__x86_rdrand(void) >> >>> > Aborted >> >>> > ``` >> >>> >> >>> Did this issue coincide with a kernel upgrade? >> >>> >> >>> Can you try and generate a lot of entropy on the system and see if the >> >>> issue goes away? >> >>> >> >>> Also check the output of 'dmesg | grep random' to see if that offers any clues. >> >> >> >> >> >> I feel the same. Looks likely that the kernel did not have enough entropy by then. Is the system not connected to a keyboard? Or it was just booted? If that’s the case, probably you could wait a while before try to launch mgr or use the ceph command line utility. >> >>> >> >>> >> >>> >> >>> > >> >>> > ``` >> >>> > # gdb --args /usr/bin/python3.7 /usr/bin/ceph -s >> >>> > GNU gdb (Debian 8.2.1-2+b3) 8.2.1 >> >>> > Copyright (C) 2018 Free Software Foundation, Inc. >> >>> > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> >> >>> > This is free software: you are free to change and redistribute it. >> >>> > There is NO WARRANTY, to the extent permitted by law. >> >>> > Type "show copying" and "show warranty" for details. >> >>> > This GDB was configured as "x86_64-linux-gnu". >> >>> > Type "show configuration" for configuration details. >> >>> > For bug reporting instructions, please see: >> >>> > <http://www.gnu.org/software/gdb/bugs/>. >> >>> > Find the GDB manual and other documentation resources online at: >> >>> > <http://www.gnu.org/software/gdb/documentation/>. >> >>> > >> >>> > For help, type "help". >> >>> > Type "apropos word" to search for commands related to "word"... >> >>> > Reading symbols from /usr/bin/python3.7...Reading symbols from /usr/lib/debug/.build-id/99/21c75e6930d3e9d9fa8c942aca9dc4500bb65f.debug...done. >> >>> > done. >> >>> > (gdb) run >> >>> > Starting program: /usr/bin/python3.7 /usr/bin/ceph -s >> >>> > [Thread debugging using libthread_db enabled] >> >>> > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". >> >>> > >> >>> > [New Thread 0x7fffed515700 (LWP 30323)] >> >>> > [New Thread 0x7fffe7fff700 (LWP 30324)] >> >>> > [New Thread 0x7fffe77fe700 (LWP 30325)] >> >>> > [Thread 0x7fffed515700 (LWP 30323) exited] >> >>> > [New Thread 0x7fffed515700 (LWP 30326)] >> >>> > [Thread 0x7fffe7fff700 (LWP 30324) exited] >> >>> > terminate called after throwing an instance of 'std::runtime_error' >> >>> > what(): random_device::__x86_rdrand(void) >> >>> > >> >>> > Thread 4 "python3.7" received signal SIGABRT, Aborted. >> >>> > [Switching to Thread 0x7fffe77fe700 (LWP 30325)] >> >>> > __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 >> >>> > 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >> >>> > (gdb) >> >>> > (gdb) bt >> >>> > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 >> >>> > #1 0x00007ffff79de535 in __GI_abort () at abort.c:79 >> >>> > #2 0x00007fffeddb8983 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 >> >>> > #3 0x00007fffeddbe8c6 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 >> >>> > #4 0x00007fffeddbe901 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6 >> >>> > #5 0x00007fffeddbeb34 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6 >> >>> > #6 0x00007fffeddba8b7 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 >> >>> > #7 0x00007fffedde6e86 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 >> >>> > #8 0x00007fffedde6fd2 in std::random_device::_M_getval() () from /lib/x86_64-linux-gnu/libstdc++.so.6 >> >>> > #9 0x00007fffee540ffc in std::random_device::operator() (this=0x7fffe77fba60) at /usr/include/c++/8/bits/random.h:1611 >> >>> > #10 ceph::util::version_1_0_3::detail::randomize_rng<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > () at ./src/include/random.h:120 >> >>> > #11 0x00007fffee54112f in ceph::util::version_1_0_3::detail::engine<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > () >> >>> > at /usr/include/c++/8/new:169 >> >>> > #12 0x00007fffee772911 in ceph::util::version_1_0_3::detail::generate_random_number<unsigned long, std::uniform_int_distribution<unsigned long>, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (min=min@entry=0, max=max@entry=18446744073709551615) at ./src/include/random.h:170 >> >>> > #13 0x00007fffee7718fe in ceph::util::version_1_0_3::generate_random_number<unsigned long, std::uniform_int_distribution<unsigned long>, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > () at ./src/include/random.h:203 >> >>> > #14 Messenger::get_random_nonce () at ./src/msg/Messenger.cc:33 >> >>> > #15 0x00007fffee771dd6 in Messenger::create_client_messenger (cct=0x7fffe80011c0, lname="temp_mon_client") at ./src/msg/Messenger.cc:16 >> >>> > #16 0x00007fffee82c862 in MonClient::get_monmap_and_config (this=this@entry=0x7fffe77fd080) at /usr/include/c++/8/ext/new_allocator.h:79 >> >>> > #17 0x00007ffff6fd9e14 in librados::v14_2_0::RadosClient::connect (this=this@entry=0x7fffe805ae80) at ./src/librados/RadosClient.cc:227 >> >>> > #18 0x00007ffff6f66a0f in _rados_connect (cluster=0x7fffe805ae80) at ./src/librados/librados_c.cc:204 >> >>> > #19 0x00007ffff71bf0b5 in ?? () from /usr/lib/python3/dist-packages/rados.cpython-37m-x86_64-linux-gnu.so >> >>> > #20 0x00007ffff71380ec in ?? () from /usr/lib/python3/dist-packages/rados.cpython-37m-x86_64-linux-gnu.so >> >>> > #21 0x00000000004d9850 in _PyObject_FastCallDict (kwargs={}, nargs=1, args=0x7fffe77fd880, callable=<cython_function_or_method at remote 0x7fffed6ae100>) >> >>> > at ../Objects/call.c:125 >> >>> > #22 _PyObject_Call_Prepend (kwargs={}, args=<optimized out>, obj=<optimized out>, callable=<cython_function_or_method at remote 0x7fffed6ae100>) at ../Objects/call.c:904 >> >>> > #23 method_call (method=<optimized out>, args=<optimized out>, kwargs=<optimized out>, method=<optimized out>, args=<optimized out>, kwargs=<optimized out>) >> >>> > at ../Objects/classobject.c:309 >> >>> > #24 0x00000000005dc4f6 in PyObject_Call (callable=<method at remote 0x7ffff7634dc8>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:245 >> >>> > #25 0x000000000054f987 in do_call_core (kwdict={}, callargs=(), func=<method at remote 0x7ffff7634dc8>) at ../Python/ceval.c:4645 >> >>> > #26 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3191 >> >>> > #27 0x00000000005d91fc in PyEval_EvalFrameEx (throwflag=0, >> >>> > f=Frame 0x7ffff72c6bb8, for file /usr/lib/python3/dist-packages/ceph_argparse.py, line 1458, in run (self=<RadosThread(args=(), kwargs={}, func=<method at remote 0x7ffff7634dc8>, exception=None, _target=None, _name='Thread-3', _args=(...), _kwargs={}, _daemonic=True, _ident=140737077307136, _tstate_lock=<_thread.lock at remote 0x7fffed63af58>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7fffed63ad00>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fffed63ad00>, release=<built-in method release of _thread.lock object at remote 0x7fffed63ad00>, _waiters=<collections.deque at remote 0x7fffed6d5a70>) at remote 0x7fffed533470>, _flag=True) at remote 0x7fffed533438>, _is_stopped=False, _initialized=True, _stderr=<_io.TextIOWrapper at remote 0x7ffff7629708>) at remote 0x7fffed533358>)) at ../Python/ceval.c:547 >> >>> > #28 function_code_fastcall (globals=<optimized out>, nargs=<optimized out>, args=<optimized out>, co=<optimized out>) at ../Objects/call.c:283 >> >>> > --Type <RET> for more, q to quit, c to continue without paging--c >> >>> > #29 _PyFunction_FastCallKeywords (func=<optimized out>, stack=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:408 >> >>> > #30 0x000000000054e7e0 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=0x7fffe77fdb30) at ../Python/ceval.c:4616 >> >>> > #31 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3110 >> >>> > #32 0x00000000005d91fc in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7fffe8000b38, for file /usr/lib/python3.7/threading.py, line 917, in _bootstrap_inner (self=<RadosThread(args=(), kwargs={}, func=<method at remote 0x7ffff7634dc8>, exception=None, _target=None, _name='Thread-3', _args=(...), _kwargs={}, _daemonic=True, _ident=140737077307136, _tstate_lock=<_thread.lock at remote 0x7fffed63af58>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7fffed63ad00>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fffed63ad00>, release=<built-in method release of _thread.lock object at remote 0x7fffed63ad00>, _waiters=<collections.deque at remote 0x7fffed6d5a70>) at remote 0x7fffed533470>, _flag=True) at remote 0x7fffed533438>, _is_stopped=False, _initialized=True, _stderr=<_io.TextIOWrapper at remote 0x7ffff7629708>) at remote 0x7fffed533358>)) at ../Python/ceval.c:547 >> >>> > #33 function_code_fastcall (globals=<optimized out>, nargs=<optimized out>, args=<optimized out>, co=<optimized out>) at ../Objects/call.c:283 >> >>> > #34 _PyFunction_FastCallKeywords (func=<optimized out>, stack=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:408 >> >>> > #35 0x000000000054e7e0 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=0x7fffe77fdcc0) at ../Python/ceval.c:4616 >> >>> > #36 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3110 >> >>> > #37 0x00000000005da536 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7fffed538048, for file /usr/lib/python3.7/threading.py, line 885, in _bootstrap (self=<RadosThread(args=(), kwargs={}, func=<method at remote 0x7ffff7634dc8>, exception=None, _target=None, _name='Thread-3', _args=(...), _kwargs={}, _daemonic=True, _ident=140737077307136, _tstate_lock=<_thread.lock at remote 0x7fffed63af58>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7fffed63ad00>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fffed63ad00>, release=<built-in method release of _thread.lock object at remote 0x7fffed63ad00>, _waiters=<collections.deque at remote 0x7fffed6d5a70>) at remote 0x7fffed533470>, _flag=True) at remote 0x7fffed533438>, _is_stopped=False, _initialized=True, _stderr=<_io.TextIOWrapper at remote 0x7ffff7629708>) at remote 0x7fffed533358>)) at ../Python/ceval.c:547 >> >>> > #38 function_code_fastcall (globals=<optimized out>, nargs=<optimized out>, args=<optimized out>, co=<optimized out>) at ../Objects/call.c:283 >> >>> > #39 _PyFunction_FastCallDict (func=<optimized out>, args=<optimized out>, nargs=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:322 >> >>> > #40 0x00000000004d97e2 in _PyObject_FastCallDict (kwargs=0x0, nargs=1, args=0x7fffe77fde00, callable=<function at remote 0x7ffff73ead90>) at ../Objects/call.c:98 >> >>> > #41 _PyObject_Call_Prepend (kwargs=0x0, args=<optimized out>, obj=<RadosThread(args=(), kwargs={}, func=<method at remote 0x7ffff7634dc8>, exception=None, _target=None, _name='Thread-3', _args=(...), _kwargs={}, _daemonic=True, _ident=140737077307136, _tstate_lock=<_thread.lock at remote 0x7fffed63af58>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7fffed63ad00>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fffed63ad00>, release=<built-in method release of _thread.lock object at remote 0x7fffed63ad00>, _waiters=<collections.deque at remote 0x7fffed6d5a70>) at remote 0x7fffed533470>, _flag=True) at remote 0x7fffed533438>, _is_stopped=False, _initialized=True, _stderr=<_io.TextIOWrapper at remote 0x7ffff7629708>) at remote 0x7fffed533358>, callable=<function at remote 0x7ffff73ead90>) at ../Objects/call.c:904 >> >>> > #42 method_call (method=<optimized out>, args=<optimized out>, kwargs=<optimized out>, method=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/classobject.c:309 >> >>> > #43 0x00000000005dc4f6 in PyObject_Call (callable=<method at remote 0x7ffff7634b08>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:245 >> >>> > #44 0x0000000000617b63 in t_bootstrap (boot_raw=boot_raw@entry=0x7fffed63ae40) at ../Modules/_threadmodule.c:994 >> >>> > #45 0x000000000062dfe4 in pythread_wrapper (arg=<optimized out>) at ../Python/thread_pthread.h:174 >> >>> > #46 0x00007ffff7f6efa3 in start_thread (arg=<optimized out>) at pthread_create.c:486 >> >>> > #47 0x00007ffff7ab54cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 >> >>> > ``` >> >>> > _______________________________________________ >> >>> > Dev mailing list -- dev@xxxxxxx >> >>> > To unsubscribe send an email to dev-leave@xxxxxxx >> >>> >> >>> >> >>> >> >>> -- >> >>> Cheers, >> >>> Brad >> >>> >> >>> _______________________________________________ >> >>> Dev mailing list -- dev@xxxxxxx >> >>> To unsubscribe send an email to dev-leave@xxxxxxx >> >> >> >> -- >> >> Regards >> >> Kefu Chai >> >> >> >> -- >> Cheers, >> Brad >> -- Cheers, Brad _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx