Re: ceph -s throwing an instance of 'std::runtime_error' random_device::__x86_rdrand(void)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Let us know how you go with the older kernel that is known to work.

On Tue, Aug 17, 2021 at 12:31 AM Links 2004 <links2004.code@xxxxxxxxx> wrote:
>
> Only one of the servers is failing of the 4 that have access to the cluster.
> since we where in the mid of an update the kernel and os versions are bit all over the place, but here we go:
>
> bad:
> Debian 10 (buster)  - 5.10.0-0.bpo.8-amd64 #1 SMP Debian 5.10.46-2~bpo10+1 (2021-07-22) x86_64 GNU/Linux
>
> good:
> Debian 10 (buster)  - 5.10.0-0.bpo.4-amd64 #1 SMP Debian 5.10.19-1~bpo10+1 (2021-03-13) x86_64 GNU/Linux
> Debian 11 (bullseye) - 5.11.22-3-pve #1 SMP PVE 5.11.22-6 (Wed, 28 Jul 2021 10:51:12 +0200) x86_64 GNU/Linux
> Debian 11 (bullseye) - 5.10.0-8-amd64 #1 SMP Debian 5.10.46-3 (2021-07-28) x86_64 GNU/Linux
>
> the proxmox one is at:
> ceph version 16.2.5 (9b9dd76e12f1907fe5dcc0c1fadadbb784022a42) pacific (stable)
>
> the rest is at:
> ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable)
>
> all 3 other servers are running all ceph commands fine.
> and 2 of them have mon running fine too.
> the 3 mon where on the bad one which is not starting with the same error.
>
> we stopped the bullseye update since we may kill a second server which will be bad for the cluster ;)
>
> as soon its as possible we will try going back to 5.10.0-0.bpo.4-amd64 on the bad one,
> since this is known to work on one of the other servers.
>
> the hardware of the server can not really be compared since this is a testing / dev cluster and composed of old hardware.
> And we tried the update there first to not kill the production environment.
>
>
> Am So., 15. Aug. 2021 um 00:13 Uhr schrieb Brad Hubbard <bhubbard@xxxxxxxxxx>:
>>
>> On Sat, Aug 14, 2021 at 7:13 PM Links 2004 <links2004.code@xxxxxxxxx> wrote:
>> >
>> > no kernel update in the last time, and its a server so no keyboard etc, but the entropy_avail looks good and its in the same range (-+80) as the other server.
>> > dmesg | grep random has no results.
>> > #  cat /proc/sys/kernel/random/entropy_avail
>> > 3547
>> >
>> > all servers run
>> > ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable)
>>
>> Can you give us some more information?
>>
>> Do all of your servers fail all of the time or do some fail some of the time?
>>
>> What do the ones that fail have in common and how do they differ from
>> the ones that do not fail?
>>
>> If you have the option to try alternative kernels on the servers that
>> exhibit this behavior I'd suggest that might be a good next step.
>>
>> >
>> >
>> >
>> >
>> > Am Sa., 14. Aug. 2021 um 10:55 Uhr schrieb kefu chai <tchaikov@xxxxxxxxx>:
>> >>
>> >>
>> >>
>> >> Brad Hubbard <bhubbard@xxxxxxxxxx>于2021年8月14日 周六06:11写道:
>> >>>
>> >>> On Sat, Aug 14, 2021 at 4:06 AM Links 2004 <links2004.code@xxxxxxxxx> wrote:
>> >>> >
>> >>> >
>> >>> > Hi,
>> >>> >
>> >>> > we are currently facing a strange problem on on of our ceph nodes.
>> >>> > it is not possible to call `ceph -s` or start a mgr with out a 'std::runtime_error'.
>> >>> >
>> >>> > find below the error message and a gdb backtrace with debug symbols.
>> >>> > hope this helps to understand the problem and point us in the correct direction.
>> >>> >
>> >>> > Thanks
>> >>> >
>> >>> > Markus
>> >>> >
>> >>> >
>> >>> > OS: Debian buster
>> >>> > kernel: 5.10.0-0.bpo.8-amd64 #1 SMP Debian 5.10.46-2~bpo10+1 (2021-07-22) x86_64 GNU/Linux
>> >>> >
>> >>> > ```
>> >>> > # ceph -v
>> >>> > ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable)
>> >>> > ```
>> >>> >
>> >>> > ```
>> >>> > # ceph -s
>> >>> > terminate called after throwing an instance of 'std::runtime_error'
>> >>> >   what():  random_device::__x86_rdrand(void)
>> >>> > Aborted
>> >>> > ```
>> >>>
>> >>> Did this issue coincide with a kernel upgrade?
>> >>>
>> >>> Can you try and generate a lot of entropy on the system and see if the
>> >>> issue goes away?
>> >>>
>> >>> Also check the output of 'dmesg | grep random' to see if that offers any clues.
>> >>
>> >>
>> >> I feel the same. Looks likely that the kernel did not have enough entropy by then. Is the system not connected to a keyboard? Or it was just booted? If that’s the case, probably you could wait a while before try to launch mgr or use the ceph command line utility.
>> >>>
>> >>>
>> >>>
>> >>> >
>> >>> > ```
>> >>> > # gdb --args /usr/bin/python3.7 /usr/bin/ceph -s
>> >>> > GNU gdb (Debian 8.2.1-2+b3) 8.2.1
>> >>> > Copyright (C) 2018 Free Software Foundation, Inc.
>> >>> > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>> >>> > This is free software: you are free to change and redistribute it.
>> >>> > There is NO WARRANTY, to the extent permitted by law.
>> >>> > Type "show copying" and "show warranty" for details.
>> >>> > This GDB was configured as "x86_64-linux-gnu".
>> >>> > Type "show configuration" for configuration details.
>> >>> > For bug reporting instructions, please see:
>> >>> > <http://www.gnu.org/software/gdb/bugs/>.
>> >>> > Find the GDB manual and other documentation resources online at:
>> >>> >     <http://www.gnu.org/software/gdb/documentation/>.
>> >>> >
>> >>> > For help, type "help".
>> >>> > Type "apropos word" to search for commands related to "word"...
>> >>> > Reading symbols from /usr/bin/python3.7...Reading symbols from /usr/lib/debug/.build-id/99/21c75e6930d3e9d9fa8c942aca9dc4500bb65f.debug...done.
>> >>> > done.
>> >>> > (gdb) run
>> >>> > Starting program: /usr/bin/python3.7 /usr/bin/ceph -s
>> >>> > [Thread debugging using libthread_db enabled]
>> >>> > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
>> >>> >
>> >>> > [New Thread 0x7fffed515700 (LWP 30323)]
>> >>> > [New Thread 0x7fffe7fff700 (LWP 30324)]
>> >>> > [New Thread 0x7fffe77fe700 (LWP 30325)]
>> >>> > [Thread 0x7fffed515700 (LWP 30323) exited]
>> >>> > [New Thread 0x7fffed515700 (LWP 30326)]
>> >>> > [Thread 0x7fffe7fff700 (LWP 30324) exited]
>> >>> > terminate called after throwing an instance of 'std::runtime_error'
>> >>> >   what():  random_device::__x86_rdrand(void)
>> >>> >
>> >>> > Thread 4 "python3.7" received signal SIGABRT, Aborted.
>> >>> > [Switching to Thread 0x7fffe77fe700 (LWP 30325)]
>> >>> > __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>> >>> > 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>> >>> > (gdb)
>> >>> > (gdb) bt
>> >>> > #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
>> >>> > #1  0x00007ffff79de535 in __GI_abort () at abort.c:79
>> >>> > #2  0x00007fffeddb8983 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
>> >>> > #3  0x00007fffeddbe8c6 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
>> >>> > #4  0x00007fffeddbe901 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
>> >>> > #5  0x00007fffeddbeb34 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
>> >>> > #6  0x00007fffeddba8b7 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
>> >>> > #7  0x00007fffedde6e86 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
>> >>> > #8  0x00007fffedde6fd2 in std::random_device::_M_getval() () from /lib/x86_64-linux-gnu/libstdc++.so.6
>> >>> > #9  0x00007fffee540ffc in std::random_device::operator() (this=0x7fffe77fba60) at /usr/include/c++/8/bits/random.h:1611
>> >>> > #10 ceph::util::version_1_0_3::detail::randomize_rng<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > () at ./src/include/random.h:120
>> >>> > #11 0x00007fffee54112f in ceph::util::version_1_0_3::detail::engine<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > ()
>> >>> >     at /usr/include/c++/8/new:169
>> >>> > #12 0x00007fffee772911 in ceph::util::version_1_0_3::detail::generate_random_number<unsigned long, std::uniform_int_distribution<unsigned long>, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (min=min@entry=0, max=max@entry=18446744073709551615) at ./src/include/random.h:170
>> >>> > #13 0x00007fffee7718fe in ceph::util::version_1_0_3::generate_random_number<unsigned long, std::uniform_int_distribution<unsigned long>, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > () at ./src/include/random.h:203
>> >>> > #14 Messenger::get_random_nonce () at ./src/msg/Messenger.cc:33
>> >>> > #15 0x00007fffee771dd6 in Messenger::create_client_messenger (cct=0x7fffe80011c0, lname="temp_mon_client") at ./src/msg/Messenger.cc:16
>> >>> > #16 0x00007fffee82c862 in MonClient::get_monmap_and_config (this=this@entry=0x7fffe77fd080) at /usr/include/c++/8/ext/new_allocator.h:79
>> >>> > #17 0x00007ffff6fd9e14 in librados::v14_2_0::RadosClient::connect (this=this@entry=0x7fffe805ae80) at ./src/librados/RadosClient.cc:227
>> >>> > #18 0x00007ffff6f66a0f in _rados_connect (cluster=0x7fffe805ae80) at ./src/librados/librados_c.cc:204
>> >>> > #19 0x00007ffff71bf0b5 in ?? () from /usr/lib/python3/dist-packages/rados.cpython-37m-x86_64-linux-gnu.so
>> >>> > #20 0x00007ffff71380ec in ?? () from /usr/lib/python3/dist-packages/rados.cpython-37m-x86_64-linux-gnu.so
>> >>> > #21 0x00000000004d9850 in _PyObject_FastCallDict (kwargs={}, nargs=1, args=0x7fffe77fd880, callable=<cython_function_or_method at remote 0x7fffed6ae100>)
>> >>> >     at ../Objects/call.c:125
>> >>> > #22 _PyObject_Call_Prepend (kwargs={}, args=<optimized out>, obj=<optimized out>, callable=<cython_function_or_method at remote 0x7fffed6ae100>) at ../Objects/call.c:904
>> >>> > #23 method_call (method=<optimized out>, args=<optimized out>, kwargs=<optimized out>, method=<optimized out>, args=<optimized out>, kwargs=<optimized out>)
>> >>> >     at ../Objects/classobject.c:309
>> >>> > #24 0x00000000005dc4f6 in PyObject_Call (callable=<method at remote 0x7ffff7634dc8>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:245
>> >>> > #25 0x000000000054f987 in do_call_core (kwdict={}, callargs=(), func=<method at remote 0x7ffff7634dc8>) at ../Python/ceval.c:4645
>> >>> > #26 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3191
>> >>> > #27 0x00000000005d91fc in PyEval_EvalFrameEx (throwflag=0,
>> >>> >     f=Frame 0x7ffff72c6bb8, for file /usr/lib/python3/dist-packages/ceph_argparse.py, line 1458, in run (self=<RadosThread(args=(), kwargs={}, func=<method at remote 0x7ffff7634dc8>, exception=None, _target=None, _name='Thread-3', _args=(...), _kwargs={}, _daemonic=True, _ident=140737077307136, _tstate_lock=<_thread.lock at remote 0x7fffed63af58>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7fffed63ad00>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fffed63ad00>, release=<built-in method release of _thread.lock object at remote 0x7fffed63ad00>, _waiters=<collections.deque at remote 0x7fffed6d5a70>) at remote 0x7fffed533470>, _flag=True) at remote 0x7fffed533438>, _is_stopped=False, _initialized=True, _stderr=<_io.TextIOWrapper at remote 0x7ffff7629708>) at remote 0x7fffed533358>)) at ../Python/ceval.c:547
>> >>> > #28 function_code_fastcall (globals=<optimized out>, nargs=<optimized out>, args=<optimized out>, co=<optimized out>) at ../Objects/call.c:283
>> >>> > --Type <RET> for more, q to quit, c to continue without paging--c
>> >>> > #29 _PyFunction_FastCallKeywords (func=<optimized out>, stack=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:408
>> >>> > #30 0x000000000054e7e0 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=0x7fffe77fdb30) at ../Python/ceval.c:4616
>> >>> > #31 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3110
>> >>> > #32 0x00000000005d91fc in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7fffe8000b38, for file /usr/lib/python3.7/threading.py, line 917, in _bootstrap_inner (self=<RadosThread(args=(), kwargs={}, func=<method at remote 0x7ffff7634dc8>, exception=None, _target=None, _name='Thread-3', _args=(...), _kwargs={}, _daemonic=True, _ident=140737077307136, _tstate_lock=<_thread.lock at remote 0x7fffed63af58>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7fffed63ad00>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fffed63ad00>, release=<built-in method release of _thread.lock object at remote 0x7fffed63ad00>, _waiters=<collections.deque at remote 0x7fffed6d5a70>) at remote 0x7fffed533470>, _flag=True) at remote 0x7fffed533438>, _is_stopped=False, _initialized=True, _stderr=<_io.TextIOWrapper at remote 0x7ffff7629708>) at remote 0x7fffed533358>)) at ../Python/ceval.c:547
>> >>> > #33 function_code_fastcall (globals=<optimized out>, nargs=<optimized out>, args=<optimized out>, co=<optimized out>) at ../Objects/call.c:283
>> >>> > #34 _PyFunction_FastCallKeywords (func=<optimized out>, stack=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:408
>> >>> > #35 0x000000000054e7e0 in call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=0x7fffe77fdcc0) at ../Python/ceval.c:4616
>> >>> > #36 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3110
>> >>> > #37 0x00000000005da536 in PyEval_EvalFrameEx (throwflag=0, f=Frame 0x7fffed538048, for file /usr/lib/python3.7/threading.py, line 885, in _bootstrap (self=<RadosThread(args=(), kwargs={}, func=<method at remote 0x7ffff7634dc8>, exception=None, _target=None, _name='Thread-3', _args=(...), _kwargs={}, _daemonic=True, _ident=140737077307136, _tstate_lock=<_thread.lock at remote 0x7fffed63af58>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7fffed63ad00>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fffed63ad00>, release=<built-in method release of _thread.lock object at remote 0x7fffed63ad00>, _waiters=<collections.deque at remote 0x7fffed6d5a70>) at remote 0x7fffed533470>, _flag=True) at remote 0x7fffed533438>, _is_stopped=False, _initialized=True, _stderr=<_io.TextIOWrapper at remote 0x7ffff7629708>) at remote 0x7fffed533358>)) at ../Python/ceval.c:547
>> >>> > #38 function_code_fastcall (globals=<optimized out>, nargs=<optimized out>, args=<optimized out>, co=<optimized out>) at ../Objects/call.c:283
>> >>> > #39 _PyFunction_FastCallDict (func=<optimized out>, args=<optimized out>, nargs=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:322
>> >>> > #40 0x00000000004d97e2 in _PyObject_FastCallDict (kwargs=0x0, nargs=1, args=0x7fffe77fde00, callable=<function at remote 0x7ffff73ead90>) at ../Objects/call.c:98
>> >>> > #41 _PyObject_Call_Prepend (kwargs=0x0, args=<optimized out>, obj=<RadosThread(args=(), kwargs={}, func=<method at remote 0x7ffff7634dc8>, exception=None, _target=None, _name='Thread-3', _args=(...), _kwargs={}, _daemonic=True, _ident=140737077307136, _tstate_lock=<_thread.lock at remote 0x7fffed63af58>, _started=<Event(_cond=<Condition(_lock=<_thread.lock at remote 0x7fffed63ad00>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fffed63ad00>, release=<built-in method release of _thread.lock object at remote 0x7fffed63ad00>, _waiters=<collections.deque at remote 0x7fffed6d5a70>) at remote 0x7fffed533470>, _flag=True) at remote 0x7fffed533438>, _is_stopped=False, _initialized=True, _stderr=<_io.TextIOWrapper at remote 0x7ffff7629708>) at remote 0x7fffed533358>, callable=<function at remote 0x7ffff73ead90>) at ../Objects/call.c:904
>> >>> > #42 method_call (method=<optimized out>, args=<optimized out>, kwargs=<optimized out>, method=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/classobject.c:309
>> >>> > #43 0x00000000005dc4f6 in PyObject_Call (callable=<method at remote 0x7ffff7634b08>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:245
>> >>> > #44 0x0000000000617b63 in t_bootstrap (boot_raw=boot_raw@entry=0x7fffed63ae40) at ../Modules/_threadmodule.c:994
>> >>> > #45 0x000000000062dfe4 in pythread_wrapper (arg=<optimized out>) at ../Python/thread_pthread.h:174
>> >>> > #46 0x00007ffff7f6efa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
>> >>> > #47 0x00007ffff7ab54cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>> >>> > ```
>> >>> > _______________________________________________
>> >>> > Dev mailing list -- dev@xxxxxxx
>> >>> > To unsubscribe send an email to dev-leave@xxxxxxx
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Cheers,
>> >>> Brad
>> >>>
>> >>> _______________________________________________
>> >>> Dev mailing list -- dev@xxxxxxx
>> >>> To unsubscribe send an email to dev-leave@xxxxxxx
>> >>
>> >> --
>> >> Regards
>> >> Kefu Chai
>>
>>
>>
>> --
>> Cheers,
>> Brad
>>


-- 
Cheers,
Brad

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux