On Tue, Feb 2, 2021 at 9:20 AM Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote: > > bump Can you create a tracker for this? I'd suggest the first step would be working out what "NOTICE: invalid dest placement: default-placement/REDUCED_REDUNDANCY" is trying to tell you. Someone more familiar with rgw than I should be able to tell you so open the tracker against rgw. > > > ----- Original Message ----- > > From: "andrei" <andrei@xxxxxxxxxx> > > To: "Daniel Gryniewicz" <dang@xxxxxxxxxx> > > Cc: "ceph-users" <ceph-users@xxxxxxx> > > Sent: Thursday, 28 January, 2021 17:07:00 > > Subject: Re: radosgw process crashes multiple times an hour > > > Hi Daniel, > > > > Thanks for you're reply. I've checked the package versions on that server and > > all ceph related packages on that server are from 15.2.8 version: > > > > ii librados2 15.2.8-1focal amd64 RADOS distributed object store > > client library > > ii libradosstriper1 15.2.8-1focal amd64 RADOS striping interface > > ii python3-rados 15.2.8-1focal amd64 Python 3 libraries for the Ceph > > librados library > > ii radosgw 15.2.8-1focal amd64 REST gateway for RADOS > > distributed object store > > ii librbd1 15.2.8-1focal amd64 RADOS block device client library > > ii python3-rbd 15.2.8-1focal amd64 Python 3 libraries for the Ceph > > librbd library > > ii ceph 15.2.8-1focal amd64 distributed storage > > and file system > > ii ceph-base 15.2.8-1focal amd64 common ceph daemon > > libraries and management tools > > ii ceph-common 15.2.8-1focal amd64 common utilities to > > mount and interact with a ceph storage cluster > > ii ceph-fuse 15.2.8-1focal amd64 FUSE-based client > > for the Ceph distributed file system > > ii ceph-mds 15.2.8-1focal amd64 metadata server for > > the ceph distributed file system > > ii ceph-mgr 15.2.8-1focal amd64 manager for the > > ceph distributed storage system > > ii ceph-mgr-cephadm 15.2.8-1focal all cephadm > > orchestrator module for ceph-mgr > > ii ceph-mgr-dashboard 15.2.8-1focal all dashboard module > > for ceph-mgr > > ii ceph-mgr-diskprediction-cloud 15.2.8-1focal all > > diskprediction-cloud module for ceph-mgr > > ii ceph-mgr-diskprediction-local 15.2.8-1focal all > > diskprediction-local module for ceph-mgr > > ii ceph-mgr-k8sevents 15.2.8-1focal all kubernetes events > > module for ceph-mgr > > ii ceph-mgr-modules-core 15.2.8-1focal all ceph manager > > modules which are always enabled > > ii ceph-mgr-rook 15.2.8-1focal all rook module for > > ceph-mgr > > ii ceph-mon 15.2.8-1focal amd64 monitor server for > > the ceph storage system > > ii ceph-osd 15.2.8-1focal amd64 OSD server for the > > ceph storage system > > ii cephadm 15.2.8-1focal amd64 cephadm utility to > > bootstrap ceph daemons with systemd and containers > > ii libcephfs2 15.2.8-1focal amd64 Ceph distributed > > file system client library > > ii python3-ceph 15.2.8-1focal amd64 Meta-package for > > python libraries for the Ceph libraries > > ii python3-ceph-argparse 15.2.8-1focal all Python 3 utility > > libraries for Ceph CLI > > ii python3-ceph-common 15.2.8-1focal all Python 3 utility > > libraries for Ceph > > ii python3-cephfs 15.2.8-1focal amd64 Python 3 libraries > > for the Ceph libcephfs library > > > > As this is a brand new 20.04 server I do not see how the older version could > > have got onto it. > > > > Andrei > > > > > > ----- Original Message ----- > >> From: "Daniel Gryniewicz" <dang@xxxxxxxxxx> > >> To: "ceph-users" <ceph-users@xxxxxxx> > >> Sent: Thursday, 28 January, 2021 14:06:16 > >> Subject: Re: radosgw process crashes multiple times an hour > > > >> It looks like your radosgw is using a different version of librados. In > >> the backtrace, the top useful line begins: > >> > >> librados::v14_2_0 > >> > >> when it should be v15.2.0, like the ceph::buffer in the same line. > >> > >> Is there an old librados lying around that didn't get cleaned up somehow? > >> > >> Daniel > >> > >> > >> > >> On 1/28/21 7:27 AM, Andrei Mikhailovsky wrote: > >>> Hello, > >>> > >>> I am experiencing very frequent crashes of the radosgw service. It happens > >>> multiple times every hour. As an example, over the last 12 hours we've had 35 > >>> crashes. Has anyone experienced similar behaviour of the radosgw octopus > >>> release service? More info below: > >>> > >>> Radosgw service is running on two Ubuntu servers. I have tried upgrading OS on > >>> one of the servers to Ubuntu 20.04 with latest updates. The second server is > >>> still running Ubuntu 18.04. Both services crash occasionally, but the service > >>> which is running on Ubuntu 20.04 crashes far more often it seems. The ceph > >>> cluster itself is pretty old and was initially setup around 2013. The cluster > >>> was updated pretty regularly with every major release. Currently, I've got > >>> Octopus 15.2.8 running on all osd, mon, mgr and radosgw servers. > >>> > >>> Crash Backtrace: > >>> > >>> ceph crash info 2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8 > >>> |less > >>> { > >>> "backtrace": [ > >>> "(()+0x46210) [0x7f815a49a210]", > >>> "(gsignal()+0xcb) [0x7f815a49a18b]", > >>> "(abort()+0x12b) [0x7f815a479859]", > >>> "(()+0x9e951) [0x7f8150ee9951]", > >>> "(()+0xaa47c) [0x7f8150ef547c]", > >>> "(()+0xaa4e7) [0x7f8150ef54e7]", > >>> "(()+0xaa799) [0x7f8150ef5799]", > >>> "(()+0x344ba) [0x7f815a1404ba]", > >>> "(()+0x71e04) [0x7f815a17de04]", > >>> "(librados::v14_2_0::IoCtx::nobjects_begin(librados::v14_2_0::ObjectCursor > >>> const&, ceph::buffer::v15_2_0::list const&)+0x5d) [0x7f815a18c7bd]", > >>> "(RGWSI_RADOS::Pool::List::init(std::__cxx11::basic_string<char, > >>> std::char_traits<char>, std::allocator<char> > const&, > >>> RGWAccessListFilter*)+0x115) [0x7f815b0d9935]", > >>> "(RGWSI_SysObj_Core::pool_list_objects_init(rgw_pool const&, > >>> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >>> const&, std::__cxx11::basic_string<char, std::char_traits<char>, > >>> std::allocator<char> > const&, RGWSI_SysObj::Pool::ListCtx*)+0x255) > >>> [0x7f815abd7035]", > >>> "(RGWSI_MetaBackend_SObj::list_init(RGWSI_MetaBackend::Context*, > >>> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >>> const&)+0x206) [0x7f815b0ccfe6]", > >>> "(RGWMetadataHandler_GenericMetaBE::list_keys_init(std::__cxx11::basic_string<char, > >>> std::char_traits<char>, std::allocator<char> > const&, void**)+0x41) > >>> [0x7f815ad23201]", > >>> "(RGWMetadataManager::list_keys_init(std::__cxx11::basic_string<char, > >>> std::char_traits<char>, std::allocator<char> > const&, > >>> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >>> const&, void**)+0x71) [0x7f815ad254d1]", > >>> "(AsyncMetadataList::_send_request()+0x9b) [0x7f815b13c70b]", > >>> "(RGWAsyncRadosProcessor::handle_request(RGWAsyncRadosRequest*)+0x25) > >>> [0x7f815ae60f25]", > >>> "(RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*, > >>> ThreadPool::TPHandle&)+0x11) [0x7f815ae69401]", > >>> "(ThreadPool::worker(ThreadPool::WorkThread*)+0x5bb) [0x7f81517b072b]", > >>> "(ThreadPool::WorkThread::entry()+0x15) [0x7f81517b17f5]", > >>> "(()+0x9609) [0x7f815130d609]", > >>> "(clone()+0x43) [0x7f815a576293]" > >>> ], > >>> "ceph_version": "15.2.8", > >>> "crash_id": "2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8", > >>> "entity_name": "client.radosgw1.gateway", > >>> "os_id": "ubuntu", > >>> "os_name": "Ubuntu", > >>> "os_version": "20.04.1 LTS (Focal Fossa)", > >>> "os_version_id": "20.04", > >>> "process_name": "radosgw", > >>> "stack_sig": "347474f09a756104ac2bb99d80e0c1fba3e9dc6f26e4ef68fe55946c103b274a", > >>> "timestamp": "2021-01-28T11:36:48.912771Z", > >>> "utsname_hostname": "arh-ibstorage1-ib", > >>> "utsname_machine": "x86_64", > >>> "utsname_release": "5.4.0-64-generic", > >>> "utsname_sysname": "Linux", > >>> "utsname_version": "#72-Ubuntu SMP Fri Jan 15 10:27:54 UTC 2021" > >>> } > >>> > >>> > >>> > >>> > >>> > >>> radosgw.log file (file names were redacted): > >>> > >>> > >>> -25> 2021-01-28T11:36:48.794+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010: > >>> 176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT /<file_name>-u115134.JPG > >>> HTTP/1.1" 400 460 - - > >>> -24> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== starting new request > >>> req=0x7f80437f5780 ===== > >>> -23> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s initializing for > >>> trans_id = tx000000000000000001431-006012a1d0-31197b5c-default > >>> -22> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s getting op 1 > >>> -21> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj > >>> verifying requester > >>> -20> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj > >>> normalizing buckets and tenants > >>> -19> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj init > >>> permissions > >>> -18> 2021-01-28T11:36:48.814+0000 7f80437fe700 0 req 5169 0s NOTICE: invalid > >>> dest placement: default-placement/REDUCED_REDUNDANCY > >>> -17> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 op->ERRORHANDLER: err_no=-22 > >>> new_err_no=-22 > >>> -16> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj op > >>> status=0 > >>> -15> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj http > >>> status=400 > >>> -14> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== req done > >>> req=0x7f80437f5780 op status=0 http_status=400 latency=0s ====== > >>> -13> 2021-01-28T11:36:48.822+0000 7f80437fe700 1 civetweb: 0x7f814c0cf9e8: > >>> 176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT > >>> /<file_name>-d20201223-u115132.JPG HTTP/1.1" 400 460 - - > >>> -12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request > >>> req=0x7f8043ff6780 ===== > >>> -11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for > >>> trans_id = tx000000000000000001432-006012a1d0-31197b5c-default > >>> -10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1 > >>> -9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying > >>> requester > >>> -8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj > >>> normalizing buckets and tenants > >>> -12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request > >>> req=0x7f8043ff6780 ===== > >>> -11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for > >>> trans_id = tx000000000000000001432-006012a1d0-31197b5c-default > >>> -10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1 > >>> -9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying > >>> requester > >>> -12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request > >>> req=0x7f8043ff6780 ===== > >>> -11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for > >>> trans_id = tx000000000000000001432-006012a1d0-31197b5c-default > >>> -10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1 > >>> -9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying > >>> requester > >>> -8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj > >>> normalizing buckets and tenants > >>> -7> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj init > >>> permissions > >>> -6> 2021-01-28T11:36:48.878+0000 7f8043fff700 0 req 5170 0s NOTICE: invalid dest > >>> placement: default-placement/REDUCED_REDUNDANCY > >>> -5> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 op->ERRORHANDLER: err_no=-22 > >>> new_err_no=-22 > >>> -4> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj op > >>> status=0 > >>> -3> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj http > >>> status=400 > >>> -2> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== req done > >>> req=0x7f8043ff6780 op status=0 http_status=400 latency=0s ====== > >>> > >>> -1> 2021-01-28T11:36:48.886+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010: > >>> 176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT > >>> /<file_name>-223-u115136.JPG HTTP/1.1" 400 460 - - > >>> 0> 2021-01-28T11:36:48.910+0000 7f8128ff9700 -1 *** Caught signal (Aborted) ** > >>> 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 deferred set uid:gid to 64045:64045 > >>> (ceph:ceph) > >>> 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 ceph version 15.2.8 > >>> (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process radosgw, > >>> pid 30417 > >>> 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework: civetweb > >>> 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework conf key: port, val: 443s > >>> > >>> > >>> Could someone help me troubleshoot and fix the issue? > >>> > >>> Thanks > >>> Andrei > >>> > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users@xxxxxxx > >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >>> > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Cheers, Brad _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx