I have a test cluster of 4 servers running Luminous. We were running 12.2.2 under Fedora 17 and have just completed upgrading to 12.2.5 under Fedora 18.
All seems well: all MONs are up, OSDs are up, I can see objects stored as expected with rados -p default.rgw.buckets.data ls.
But when i start RGW, my load goes through the roof as radosgw continuously rapid-fire core dumps.
-16> 2018-05-21 15:52:48.244579 7fc70eeda700 5 -- 10.19.33.13:0/3446208184 >> 10.19.33.14:6800/1417 conn(0x55e78a610800 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14567 cs=1 l=1). rx osd.6 seq 7 0x55e78a67b500 osd_op_reply(47 notify.6 [watch watch cookie 94452947886080] v1092'43446 uv43445 _ondisk_ = 0) v8
-15> 2018-05-21 15:52:48.244619 7fc70eeda700 1 -- 10.19.33.13:0/3446208184 <== osd.6 10.19.33.14:6800/1417 7 ==== osd_op_reply(47 notify.6 [watch watch cookie 94452947886080] v1092'43446 uv43445 _ondisk_ = 0) v8 ==== 152+0+0 (1199963694 0 0) 0x55e78a67b500 con 0x55e78a610800
-14> 2018-05-21 15:52:48.244777 7fc723656000 1 -- 10.19.33.13:0/3446208184 --> 10.19.33.15:6800/1433 -- osd_op(unknown.0.0:48 16.1 16:93e5b521:::notify.7:head [create] snapc 0=[] ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a67bc00 con 0
-13> 2018-05-21 15:52:48.275650 7fc70eeda700 5 -- 10.19.33.13:0/3446208184 >> 10.19.33.15:6800/1433 conn(0x55e78a65e000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14572 cs=1 l=1). rx osd.2 seq 7 0x55e78a678380 osd_op_reply(48 notify.7 [create] v1092'43453 uv43453 _ondisk_ = 0) v8
-12> 2018-05-21 15:52:48.275675 7fc70eeda700 1 -- 10.19.33.13:0/3446208184 <== osd.2 10.19.33.15:6800/1433 7 ==== osd_op_reply(48 notify.7 [create] v1092'43453 uv43453 _ondisk_ = 0) v8 ==== 152+0+0 (2720997170 0 0) 0x55e78a678380 con 0x55e78a65e000
-11> 2018-05-21 15:52:48.275849 7fc723656000 1 -- 10.19.33.13:0/3446208184 --> 10.19.33.15:6800/1433 -- osd_op(unknown.0.0:49 16.1 16:93e5b521:::notify.7:head [watch watch cookie 94452947887232] snapc 0=[] ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a688000 con 0
-10> 2018-05-21 15:52:48.296799 7fc70eeda700 5 -- 10.19.33.13:0/3446208184 >> 10.19.33.15:6800/1433 conn(0x55e78a65e000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=14572 cs=1 l=1). rx osd.2 seq 8 0x55e78a688000 osd_op_reply(49 notify.7 [watch watch cookie 94452947887232] v1092'43454 uv43453 _ondisk_ = 0) v8
-9> 2018-05-21 15:52:48.296824 7fc70eeda700 1 -- 10.19.33.13:0/3446208184 <== osd.2 10.19.33.15:6800/1433 8 ==== osd_op_reply(49 notify.7 [watch watch cookie 94452947887232] v1092'43454 uv43453 _ondisk_ = 0) v8 ==== 152+0+0 (3812136207 0 0) 0x55e78a688000 con 0x55e78a65e000
-8> 2018-05-21 15:52:48.296924 7fc723656000 2 all 8 watchers are set, enabling cache
-7> 2018-05-21 15:52:48.297135 7fc57cbb6700 2 garbage collection: start
-6> 2018-05-21 15:52:48.297185 7fc57c3b5700 2 object expiration: start
-5> 2018-05-21 15:52:48.297321 7fc57cbb6700 1 -- 10.19.33.13:0/3446208184 --> 10.19.33.16:6804/1596 -- osd_op(unknown.0.0:50 18.3 18:d242335b:gc::gc.2:head [call lock.lock] snapc 0=[] ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a692000 con 0
-4> 2018-05-21 15:52:48.297395 7fc57c3b5700 1 -- 10.19.33.13:0/3446208184 --> 10.19.33.16:6804/1596 -- osd_op(unknown.0.0:51 18.0 18:1a734c59:::obj_delete_at_hint.0000000000:head [call lock.lock] snapc 0=[] ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a692380 con 0
-3> 2018-05-21 15:52:48.299463 7fc568b8e700 5 schedule life cycle next start time: Tue May 22 04:00:00 2018
-2> 2018-05-21 15:52:48.299528 7fc567b8c700 5 ERROR: sync_all_users() returned ret=-2
-1> 2018-05-21 15:52:48.299698 7fc56738b700 1 -- 10.19.33.13:0/3446208184 --> 10.19.33.14:6800/1417 -- osd_op(unknown.0.0:52 18.7 18:e9187ab8:reshard::reshard.0000000000:head [call lock.lock] snapc 0=[] ondisk+write+known_if_redirected e1092) v8 -- 0x55e78a54fc00 con 0
0> 2018-05-21 15:52:48.301978 7fc723656000 -1 *** Caught signal (Aborted) **
in thread 7fc723656000 thread_name:radosgw
ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
1: (()+0x22d82c) [0x55e7861a882c]
2: (()+0x11fb0) [0x7fc719270fb0]
3: (gsignal()+0x10b) [0x7fc716603f4b]
4: (abort()+0x12b) [0x7fc7165ee591]
5: (parse_rgw_ldap_bindpw[abi:cxx11](CephContext*)+0x68b) [0x55e78647409b]
6: (rgw::auth::s3::LDAPEngine::init(CephContext*)+0xb9) [0x55e7863a38f9]
7: (rgw::auth::s3::ExternalAuthStrategy::ExternalAuthStrategy(CephContext*, RGWRados*, rgw::auth::s3::AWSEngine::VersionAbstractor*)+0x74) [0x55e786154bc4]
8: (std::__shared_ptr<rgw::auth::StrategyRegistry, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<rgw::auth::StrategyRegistry>, CephContext* const&, RGWRados* const&>(std::_Sp_make_shared_tag, std::allocator<rgw::auth::StrategyRegistry> const&, CephContext* const&, RGWRados* const&)+0xf8) [0x55e786158f78]
9: (main()+0x196b) [0x55e78614463b]
10: (__libc_start_main()+0xeb) [0x7fc7165f01bb]
11: (_start()+0x2a) [0x55e78614c3da]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Marc D. Spencer - LiquidPixels
|
Marc D. Spencer
Chief Technology Officer
|