Re: radosgw process crashes multiple times an hour

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 2, 2021 at 9:20 AM Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
>
> bump

Can you create a tracker for this?

I'd suggest the first step would be working out what "NOTICE: invalid
dest placement: default-placement/REDUCED_REDUNDANCY" is trying to
tell you. Someone more familiar with rgw than I should be able to tell
you so open the tracker against rgw.

>
>
> ----- Original Message -----
> > From: "andrei" <andrei@xxxxxxxxxx>
> > To: "Daniel Gryniewicz" <dang@xxxxxxxxxx>
> > Cc: "ceph-users" <ceph-users@xxxxxxx>
> > Sent: Thursday, 28 January, 2021 17:07:00
> > Subject:  Re: radosgw process crashes multiple times an hour
>
> > Hi Daniel,
> >
> > Thanks for you're reply. I've checked the package versions on that server and
> > all ceph related packages on that server are from 15.2.8 version:
> >
> > ii  librados2        15.2.8-1focal amd64        RADOS distributed object store
> > client library
> > ii  libradosstriper1 15.2.8-1focal amd64        RADOS striping interface
> > ii  python3-rados    15.2.8-1focal amd64        Python 3 libraries for the Ceph
> > librados library
> > ii  radosgw          15.2.8-1focal amd64        REST gateway for RADOS
> > distributed object store
> > ii  librbd1        15.2.8-1focal amd64        RADOS block device client library
> > ii  python3-rbd    15.2.8-1focal amd64        Python 3 libraries for the Ceph
> > librbd library
> > ii  ceph                          15.2.8-1focal amd64        distributed storage
> > and file system
> > ii  ceph-base                     15.2.8-1focal amd64        common ceph daemon
> > libraries and management tools
> > ii  ceph-common                   15.2.8-1focal amd64        common utilities to
> > mount and interact with a ceph storage cluster
> > ii  ceph-fuse                     15.2.8-1focal amd64        FUSE-based client
> > for the Ceph distributed file system
> > ii  ceph-mds                      15.2.8-1focal amd64        metadata server for
> > the ceph distributed file system
> > ii  ceph-mgr                      15.2.8-1focal amd64        manager for the
> > ceph distributed storage system
> > ii  ceph-mgr-cephadm              15.2.8-1focal all          cephadm
> > orchestrator module for ceph-mgr
> > ii  ceph-mgr-dashboard            15.2.8-1focal all          dashboard module
> > for ceph-mgr
> > ii  ceph-mgr-diskprediction-cloud 15.2.8-1focal all
> > diskprediction-cloud module for ceph-mgr
> > ii  ceph-mgr-diskprediction-local 15.2.8-1focal all
> > diskprediction-local module for ceph-mgr
> > ii  ceph-mgr-k8sevents            15.2.8-1focal all          kubernetes events
> > module for ceph-mgr
> > ii  ceph-mgr-modules-core         15.2.8-1focal all          ceph manager
> > modules which are always enabled
> > ii  ceph-mgr-rook                 15.2.8-1focal all          rook module for
> > ceph-mgr
> > ii  ceph-mon                      15.2.8-1focal amd64        monitor server for
> > the ceph storage system
> > ii  ceph-osd                      15.2.8-1focal amd64        OSD server for the
> > ceph storage system
> > ii  cephadm                       15.2.8-1focal amd64        cephadm utility to
> > bootstrap ceph daemons with systemd and containers
> > ii  libcephfs2                    15.2.8-1focal amd64        Ceph distributed
> > file system client library
> > ii  python3-ceph                  15.2.8-1focal amd64        Meta-package for
> > python libraries for the Ceph libraries
> > ii  python3-ceph-argparse         15.2.8-1focal all          Python 3 utility
> > libraries for Ceph CLI
> > ii  python3-ceph-common           15.2.8-1focal all          Python 3 utility
> > libraries for Ceph
> > ii  python3-cephfs                15.2.8-1focal amd64        Python 3 libraries
> > for the Ceph libcephfs library
> >
> > As this is a brand new 20.04 server I do not see how the older version could
> > have got onto it.
> >
> > Andrei
> >
> >
> > ----- Original Message -----
> >> From: "Daniel Gryniewicz" <dang@xxxxxxxxxx>
> >> To: "ceph-users" <ceph-users@xxxxxxx>
> >> Sent: Thursday, 28 January, 2021 14:06:16
> >> Subject:  Re: radosgw process crashes multiple times an hour
> >
> >> It looks like your radosgw is using a different version of librados.  In
> >> the backtrace, the top useful line begins:
> >>
> >> librados::v14_2_0
> >>
> >> when it should be v15.2.0, like the ceph::buffer in the same line.
> >>
> >> Is there an old librados lying around that didn't get cleaned up somehow?
> >>
> >> Daniel
> >>
> >>
> >>
> >> On 1/28/21 7:27 AM, Andrei Mikhailovsky wrote:
> >>> Hello,
> >>>
> >>> I am experiencing very frequent crashes of the radosgw service. It happens
> >>> multiple times every hour. As an example, over the last 12 hours we've had 35
> >>> crashes. Has anyone experienced similar behaviour of the radosgw octopus
> >>> release service? More info below:
> >>>
> >>> Radosgw service is running on two Ubuntu servers. I have tried upgrading OS on
> >>> one of the servers to Ubuntu 20.04 with latest updates. The second server is
> >>> still running Ubuntu 18.04. Both services crash occasionally, but the service
> >>> which is running on Ubuntu 20.04 crashes far more often it seems. The ceph
> >>> cluster itself is pretty old and was initially setup around 2013. The cluster
> >>> was updated pretty regularly with every major release. Currently, I've got
> >>> Octopus 15.2.8 running on all osd, mon, mgr and radosgw servers.
> >>>
> >>> Crash Backtrace:
> >>>
> >>> ceph crash info 2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8
> >>> |less
> >>> {
> >>> "backtrace": [
> >>> "(()+0x46210) [0x7f815a49a210]",
> >>> "(gsignal()+0xcb) [0x7f815a49a18b]",
> >>> "(abort()+0x12b) [0x7f815a479859]",
> >>> "(()+0x9e951) [0x7f8150ee9951]",
> >>> "(()+0xaa47c) [0x7f8150ef547c]",
> >>> "(()+0xaa4e7) [0x7f8150ef54e7]",
> >>> "(()+0xaa799) [0x7f8150ef5799]",
> >>> "(()+0x344ba) [0x7f815a1404ba]",
> >>> "(()+0x71e04) [0x7f815a17de04]",
> >>> "(librados::v14_2_0::IoCtx::nobjects_begin(librados::v14_2_0::ObjectCursor
> >>> const&, ceph::buffer::v15_2_0::list const&)+0x5d) [0x7f815a18c7bd]",
> >>> "(RGWSI_RADOS::Pool::List::init(std::__cxx11::basic_string<char,
> >>> std::char_traits<char>, std::allocator<char> > const&,
> >>> RGWAccessListFilter*)+0x115) [0x7f815b0d9935]",
> >>> "(RGWSI_SysObj_Core::pool_list_objects_init(rgw_pool const&,
> >>> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
> >>> const&, std::__cxx11::basic_string<char, std::char_traits<char>,
> >>> std::allocator<char> > const&, RGWSI_SysObj::Pool::ListCtx*)+0x255)
> >>> [0x7f815abd7035]",
> >>> "(RGWSI_MetaBackend_SObj::list_init(RGWSI_MetaBackend::Context*,
> >>> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
> >>> const&)+0x206) [0x7f815b0ccfe6]",
> >>> "(RGWMetadataHandler_GenericMetaBE::list_keys_init(std::__cxx11::basic_string<char,
> >>> std::char_traits<char>, std::allocator<char> > const&, void**)+0x41)
> >>> [0x7f815ad23201]",
> >>> "(RGWMetadataManager::list_keys_init(std::__cxx11::basic_string<char,
> >>> std::char_traits<char>, std::allocator<char> > const&,
> >>> std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
> >>> const&, void**)+0x71) [0x7f815ad254d1]",
> >>> "(AsyncMetadataList::_send_request()+0x9b) [0x7f815b13c70b]",
> >>> "(RGWAsyncRadosProcessor::handle_request(RGWAsyncRadosRequest*)+0x25)
> >>> [0x7f815ae60f25]",
> >>> "(RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*,
> >>> ThreadPool::TPHandle&)+0x11) [0x7f815ae69401]",
> >>> "(ThreadPool::worker(ThreadPool::WorkThread*)+0x5bb) [0x7f81517b072b]",
> >>> "(ThreadPool::WorkThread::entry()+0x15) [0x7f81517b17f5]",
> >>> "(()+0x9609) [0x7f815130d609]",
> >>> "(clone()+0x43) [0x7f815a576293]"
> >>> ],
> >>> "ceph_version": "15.2.8",
> >>> "crash_id": "2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8",
> >>> "entity_name": "client.radosgw1.gateway",
> >>> "os_id": "ubuntu",
> >>> "os_name": "Ubuntu",
> >>> "os_version": "20.04.1 LTS (Focal Fossa)",
> >>> "os_version_id": "20.04",
> >>> "process_name": "radosgw",
> >>> "stack_sig": "347474f09a756104ac2bb99d80e0c1fba3e9dc6f26e4ef68fe55946c103b274a",
> >>> "timestamp": "2021-01-28T11:36:48.912771Z",
> >>> "utsname_hostname": "arh-ibstorage1-ib",
> >>> "utsname_machine": "x86_64",
> >>> "utsname_release": "5.4.0-64-generic",
> >>> "utsname_sysname": "Linux",
> >>> "utsname_version": "#72-Ubuntu SMP Fri Jan 15 10:27:54 UTC 2021"
> >>> }
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> radosgw.log file (file names were redacted):
> >>>
> >>>
> >>> -25> 2021-01-28T11:36:48.794+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010:
> >>> 176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT /<file_name>-u115134.JPG
> >>> HTTP/1.1" 400 460 - -
> >>> -24> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== starting new request
> >>> req=0x7f80437f5780 =====
> >>> -23> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s initializing for
> >>> trans_id = tx000000000000000001431-006012a1d0-31197b5c-default
> >>> -22> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s getting op 1
> >>> -21> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj
> >>> verifying requester
> >>> -20> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj
> >>> normalizing buckets and tenants
> >>> -19> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj init
> >>> permissions
> >>> -18> 2021-01-28T11:36:48.814+0000 7f80437fe700 0 req 5169 0s NOTICE: invalid
> >>> dest placement: default-placement/REDUCED_REDUNDANCY
> >>> -17> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 op->ERRORHANDLER: err_no=-22
> >>> new_err_no=-22
> >>> -16> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj op
> >>> status=0
> >>> -15> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj http
> >>> status=400
> >>> -14> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== req done
> >>> req=0x7f80437f5780 op status=0 http_status=400 latency=0s ======
> >>> -13> 2021-01-28T11:36:48.822+0000 7f80437fe700 1 civetweb: 0x7f814c0cf9e8:
> >>> 176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT
> >>> /<file_name>-d20201223-u115132.JPG HTTP/1.1" 400 460 - -
> >>> -12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request
> >>> req=0x7f8043ff6780 =====
> >>> -11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for
> >>> trans_id = tx000000000000000001432-006012a1d0-31197b5c-default
> >>> -10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1
> >>> -9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying
> >>> requester
> >>> -8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj
> >>> normalizing buckets and tenants
> >>> -12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request
> >>> req=0x7f8043ff6780 =====
> >>> -11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for
> >>> trans_id = tx000000000000000001432-006012a1d0-31197b5c-default
> >>> -10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1
> >>> -9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying
> >>> requester
> >>> -12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request
> >>> req=0x7f8043ff6780 =====
> >>> -11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for
> >>> trans_id = tx000000000000000001432-006012a1d0-31197b5c-default
> >>> -10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1
> >>> -9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying
> >>> requester
> >>> -8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj
> >>> normalizing buckets and tenants
> >>> -7> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj init
> >>> permissions
> >>> -6> 2021-01-28T11:36:48.878+0000 7f8043fff700 0 req 5170 0s NOTICE: invalid dest
> >>> placement: default-placement/REDUCED_REDUNDANCY
> >>> -5> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 op->ERRORHANDLER: err_no=-22
> >>> new_err_no=-22
> >>> -4> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj op
> >>> status=0
> >>> -3> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj http
> >>> status=400
> >>> -2> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== req done
> >>> req=0x7f8043ff6780 op status=0 http_status=400 latency=0s ======
> >>>
> >>> -1> 2021-01-28T11:36:48.886+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010:
> >>> 176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT
> >>> /<file_name>-223-u115136.JPG HTTP/1.1" 400 460 - -
> >>> 0> 2021-01-28T11:36:48.910+0000 7f8128ff9700 -1 *** Caught signal (Aborted) **
> >>> 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 deferred set uid:gid to 64045:64045
> >>> (ceph:ceph)
> >>> 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 ceph version 15.2.8
> >>> (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process radosgw,
> >>> pid 30417
> >>> 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework: civetweb
> >>> 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework conf key: port, val: 443s
> >>>
> >>>
> >>> Could someone help me troubleshoot and fix the issue?
> >>>
> >>> Thanks
> >>> Andrei
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@xxxxxxx
> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>


-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux