radosgw process crashes multiple times an hour

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, 

I am experiencing very frequent crashes of the radosgw service. It happens multiple times every hour. As an example, over the last 12 hours we've had 35 crashes. Has anyone experienced similar behaviour of the radosgw octopus release service? More info below: 

Radosgw service is running on two Ubuntu servers. I have tried upgrading OS on one of the servers to Ubuntu 20.04 with latest updates. The second server is still running Ubuntu 18.04. Both services crash occasionally, but the service which is running on Ubuntu 20.04 crashes far more often it seems. The ceph cluster itself is pretty old and was initially setup around 2013. The cluster was updated pretty regularly with every major release. Currently, I've got Octopus 15.2.8 running on all osd, mon, mgr and radosgw servers. 

Crash Backtrace: 

ceph crash info 2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8 |less 
{ 
"backtrace": [ 
"(()+0x46210) [0x7f815a49a210]", 
"(gsignal()+0xcb) [0x7f815a49a18b]", 
"(abort()+0x12b) [0x7f815a479859]", 
"(()+0x9e951) [0x7f8150ee9951]", 
"(()+0xaa47c) [0x7f8150ef547c]", 
"(()+0xaa4e7) [0x7f8150ef54e7]", 
"(()+0xaa799) [0x7f8150ef5799]", 
"(()+0x344ba) [0x7f815a1404ba]", 
"(()+0x71e04) [0x7f815a17de04]", 
"(librados::v14_2_0::IoCtx::nobjects_begin(librados::v14_2_0::ObjectCursor const&, ceph::buffer::v15_2_0::list const&)+0x5d) [0x7f815a18c7bd]", 
"(RGWSI_RADOS::Pool::List::init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWAccessListFilter*)+0x115) [0x7f815b0d9935]", 
"(RGWSI_SysObj_Core::pool_list_objects_init(rgw_pool const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWSI_SysObj::Pool::ListCtx*)+0x255) [0x7f815abd7035]", 
"(RGWSI_MetaBackend_SObj::list_init(RGWSI_MetaBackend::Context*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x206) [0x7f815b0ccfe6]", 
"(RGWMetadataHandler_GenericMetaBE::list_keys_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void**)+0x41) [0x7f815ad23201]", 
"(RGWMetadataManager::list_keys_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void**)+0x71) [0x7f815ad254d1]", 
"(AsyncMetadataList::_send_request()+0x9b) [0x7f815b13c70b]", 
"(RGWAsyncRadosProcessor::handle_request(RGWAsyncRadosRequest*)+0x25) [0x7f815ae60f25]", 
"(RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*, ThreadPool::TPHandle&)+0x11) [0x7f815ae69401]", 
"(ThreadPool::worker(ThreadPool::WorkThread*)+0x5bb) [0x7f81517b072b]", 
"(ThreadPool::WorkThread::entry()+0x15) [0x7f81517b17f5]", 
"(()+0x9609) [0x7f815130d609]", 
"(clone()+0x43) [0x7f815a576293]" 
], 
"ceph_version": "15.2.8", 
"crash_id": "2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8", 
"entity_name": "client.radosgw1.gateway", 
"os_id": "ubuntu", 
"os_name": "Ubuntu", 
"os_version": "20.04.1 LTS (Focal Fossa)", 
"os_version_id": "20.04", 
"process_name": "radosgw", 
"stack_sig": "347474f09a756104ac2bb99d80e0c1fba3e9dc6f26e4ef68fe55946c103b274a", 
"timestamp": "2021-01-28T11:36:48.912771Z", 
"utsname_hostname": "arh-ibstorage1-ib", 
"utsname_machine": "x86_64", 
"utsname_release": "5.4.0-64-generic", 
"utsname_sysname": "Linux", 
"utsname_version": "#72-Ubuntu SMP Fri Jan 15 10:27:54 UTC 2021" 
} 





radosgw.log file (file names were redacted): 


-25> 2021-01-28T11:36:48.794+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010: 176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT /<file_name>-u115134.JPG HTTP/1.1" 400 460 - - 
-24> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== starting new request req=0x7f80437f5780 ===== 
-23> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s initializing for trans_id = tx000000000000000001431-006012a1d0-31197b5c-default 
-22> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s getting op 1 
-21> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj verifying requester 
-20> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj normalizing buckets and tenants 
-19> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj init permissions 
-18> 2021-01-28T11:36:48.814+0000 7f80437fe700 0 req 5169 0s NOTICE: invalid dest placement: default-placement/REDUCED_REDUNDANCY 
-17> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 op->ERRORHANDLER: err_no=-22 new_err_no=-22 
-16> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj op status=0 
-15> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj http status=400 
-14> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== req done req=0x7f80437f5780 op status=0 http_status=400 latency=0s ====== 
-13> 2021-01-28T11:36:48.822+0000 7f80437fe700 1 civetweb: 0x7f814c0cf9e8: 176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT /<file_name>-d20201223-u115132.JPG HTTP/1.1" 400 460 - - 
-12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request req=0x7f8043ff6780 ===== 
-11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for trans_id = tx000000000000000001432-006012a1d0-31197b5c-default 
-10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1 
-9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying requester 
-8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj normalizing buckets and tenants 
-12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request req=0x7f8043ff6780 ===== 
-11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for trans_id = tx000000000000000001432-006012a1d0-31197b5c-default 
-10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1 
-9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying requester 
-12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request req=0x7f8043ff6780 ===== 
-11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for trans_id = tx000000000000000001432-006012a1d0-31197b5c-default 
-10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1 
-9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying requester 
-8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj normalizing buckets and tenants 
-7> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj init permissions 
-6> 2021-01-28T11:36:48.878+0000 7f8043fff700 0 req 5170 0s NOTICE: invalid dest placement: default-placement/REDUCED_REDUNDANCY 
-5> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 op->ERRORHANDLER: err_no=-22 new_err_no=-22 
-4> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj op status=0 
-3> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj http status=400 
-2> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== req done req=0x7f8043ff6780 op status=0 http_status=400 latency=0s ====== 

-1> 2021-01-28T11:36:48.886+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010: 176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT /<file_name>-223-u115136.JPG HTTP/1.1" 400 460 - - 
0> 2021-01-28T11:36:48.910+0000 7f8128ff9700 -1 *** Caught signal (Aborted) ** 
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 deferred set uid:gid to 64045:64045 (ceph:ceph) 
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 ceph version 15.2.8 (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process radosgw, pid 30417 
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework: civetweb 
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework conf key: port, val: 443s 


Could someone help me troubleshoot and fix the issue? 

Thanks 
Andrei 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux