Reef: rgw daemon crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

yesterday I upgraded a customer cluster to Reef (18.2.4). The upgrade went quite well, nothing happened for hours, until it did. One of the two RGW daemons has crashed twice in the last 12 hours. Here's one backtrace:

---snip---
Feb 06 23:19:58 storage09 conmon[2501983]: radosgw: /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/18.2.4/rpm/el9/BUILD/ceph-18.2.4/redhat-linux-build/boost/include/boost/context/posix/protected_fixedsize_stack.hpp:70: boost::context::stack_context boost::context::basic_protected_fixedsize_stack<traitsT>::allocate() [with traitsT = boost::context::stack_traits]: Assertion `0 == result' failed.
Feb 06 23:19:58 storage09 conmon[2501983]: *** Caught signal (Aborted) **
Feb 06 23:19:58 storage09 conmon[2501983]: in thread 7f77a10b2640 thread_name:radosgw Feb 06 23:19:58 storage09 conmon[2501983]: ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable) Feb 06 23:19:58 storage09 conmon[2501983]: 1: /lib64/libc.so.6(+0x3e6f0) [0x7f78adeba6f0] Feb 06 23:19:58 storage09 conmon[2501983]: 2: /lib64/libc.so.6(+0x8b94c) [0x7f78adf0794c]
Feb 06 23:19:58 storage09 conmon[2501983]:  3: raise()
Feb 06 23:19:58 storage09 conmon[2501983]:  4: abort()
Feb 06 23:19:58 storage09 conmon[2501983]: 5: /lib64/libc.so.6(+0x2871b) [0x7f78adea471b] Feb 06 23:19:58 storage09 conmon[2501983]: 6: /lib64/libc.so.6(+0x37386) [0x7f78adeb3386] Feb 06 23:19:58 storage09 conmon[2501983]: 7: /usr/bin/radosgw(+0x361cb2) [0x561ed9ef3cb2] Feb 06 23:19:58 storage09 conmon[2501983]: 8: /usr/bin/radosgw(+0x361db8) [0x561ed9ef3db8] Feb 06 23:19:58 storage09 conmon[2501983]: 9: /usr/bin/radosgw(+0x36e15e) [0x561ed9f0015e] Feb 06 23:19:58 storage09 conmon[2501983]: 10: /usr/bin/radosgw(+0x357558) [0x561ed9ee9558] Feb 06 23:19:58 storage09 conmon[2501983]: 11: /usr/bin/radosgw(+0x34546c) [0x561ed9ed746c] Feb 06 23:19:58 storage09 conmon[2501983]: 12: /usr/bin/radosgw(+0x358f0a) [0x561ed9eeaf0a] Feb 06 23:19:58 storage09 conmon[2501983]: 13: /usr/bin/radosgw(+0xb705de) [0x561eda7025de] Feb 06 23:19:58 storage09 conmon[2501983]: 14: /usr/bin/radosgw(+0x3c6aed) [0x561ed9f58aed] Feb 06 23:19:58 storage09 conmon[2501983]: 15: /lib64/libstdc++.so.6(+0xdbad4) [0x7f78ae258ad4] Feb 06 23:19:58 storage09 conmon[2501983]: 16: /lib64/libc.so.6(+0x89c02) [0x7f78adf05c02] Feb 06 23:19:58 storage09 conmon[2501983]: 17: /lib64/libc.so.6(+0x10ec40) [0x7f78adf8ac40]
---snip---

I didn't find anything helpful in the tracker, only this report on this list [0] without a response from a year ago. The other daemon on a different host seems to be stable for now. This is no multi-site deployment, just two RGWs for a single zone. Any comments/pointers are appreciated! I can file a tracker issue if this is something new.

Thanks!
Eugen

[0] https://www.spinics.net/lists/ceph-users/msg80956.html
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux