Luminous 12.2.5 - crushable RGW

Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx> · Mon, 16 Jul 2018 17:25:00 +0200

Hi,
We run 5 RADOS Gateways on Luminous 12.2.5 as upstream servers in nginx active-active setup, based on keepalived.
Cluster is 12x Ceph nodes (16x 10TB OSD(bluestore) per node, 2x 10Gb network link shared by access and cluster networks), RGW pool is EC 9+3.

We recently noticed below entries in RGW logs:

2018-07-11 06:19:13.726392 7f2eeed46700  1 ====== starting new request req=0x7f2eeed402c0 =====
2018-07-11 06:19:13.871358 7f2eeed46700  0 NOTICE: resharding operation on bucket index detected, blocking
2018-07-11 06:19:58.953816 7f2eeed46700  0 block_while_resharding ERROR: bucket is still resharding, please retry
2018-07-11 06:19:58.959424 7f2eeed46700  0 NOTICE: resharding operation on bucket index detected, blocking
2018-07-11 06:20:44.088045 7f2eeed46700  0 block_while_resharding ERROR: bucket is still resharding, please retry
2018-07-11 06:20:44.090664 7f2eeed46700  0 NOTICE: resharding operation on bucket index detected, blocking
2018-07-11 06:21:29.141182 7f2eeed46700  0 block_while_resharding ERROR: bucket is still resharding, please retry
2018-07-11 06:21:29.146598 7f2eeed46700  0 NOTICE: resharding operation on bucket index detected, blocking
2018-07-11 06:22:14.178369 7f2eeed46700  0 block_while_resharding ERROR: bucket is still resharding, please retry
2018-07-11 06:22:14.181697 7f2eeed46700  0 NOTICE: resharding operation on bucket index detected, blocking
2018-07-11 06:22:34.199763 7f2eeed46700  1 ====== req done req=0x7f2eeed402c0 op status=0 http_status=200 ======
2018-07-11 06:22:34.199851 7f2eeed46700  1 civetweb: 0x5599a1158000: 10.195.17.6 - - [11/Jul/2018:06:10:11 +0000] "PUT /BUCKET/PATH/OBJECT?partNumber=2&uploadId=2~ol_fQw_u7eKRjuP1qVwnj5V12GxDYXu HTTP/1.1" 200 0 - -

Causing 'upstream timed out (110: Connection timed out) while reading response header from upstream' errors and 504 response code on nginx side due to 30 seconds timeout.

Other recurring log entries look like:

2018-07-11 06:20:47.407632 7f2e97c98700  1 ====== starting new request req=0x7f2e97c922c0 =====
2018-07-11 06:20:47.412455 7f2e97c98700  0 NOTICE: resharding operation on bucket index detected, blocking
2018-07-11 06:21:32.424983 7f2e97c98700  0 block_while_resharding ERROR: bucket is still resharding, please retry
2018-07-11 06:21:32.426597 7f2e97c98700  0 NOTICE: resharding operation on bucket index detected, blocking
2018-07-11 06:22:17.444467 7f2e97c98700  0 block_while_resharding ERROR: bucket is still resharding, please retry
2018-07-11 06:22:17.492217 7f2e97c98700  0 NOTICE: resharding operation on bucket index detected, blocking

2018-07-11 06:22:32.495254 7f2e97c98700  0 ERROR: update_bucket_id() new_bucket_id=d644765c-1705-49b2-9609-a8511d3c4fed.151639.105 returned r=-125
2018-07-11 06:22:32.495386 7f2e97c98700  0 WARNING: set_req_state_err err_no=125 resorting to 500

2018-07-11 06:22:32.495509 7f2e97c98700  1 ====== req done req=0x7f2e97c922c0 op status=-125 http_status=500 ======
2018-07-11 06:22:32.495569 7f2e97c98700  1 civetweb: 0x5599a14f4000: 10.195.17.6 - - [11/Jul/2018:06:19:25 +0000] "POST PUT /BUCKET/PATH/OBJECT?uploads HTTP/1.1" 500 0 - -

To avoid 504 & 500 responses we disabled dynamic resharding via 'rgw_dynamic_resharding = false'. Not sure if setting nginx 'proxy_read_timeout' option to value higher than bucket resharding time is a good idea.

Once done, 'block_while_resharding ERROR: bucket is still resharding, please retry' disappeared from RGW logs, however, another ERROR is now logged and then RGWs catch singal Aborted and get restarted by systemd:

2018-07-13 05:27:31.149618 7f7eb72c7700  1 ====== starting new request req=0x7f7eb72c12c0 =====
2018-07-13 05:27:52.593413 7f7eb72c7700  0 ERROR: flush_read_list(): d->client_cb->handle_data() returned -5
2018-07-13 05:27:52.594040 7f7eb72c7700  1 ====== req done req=0x7f7eb72c12c0 op status=-5 http_status=206 ======
2018-07-13 05:27:52.594633 7f7eb72c7700  1 civetweb: 0x55ab3171b000: 10.195.17.6 - - [13/Jul/2018:05:24:28 +0000] "GET /BUCKET/PATH/OBJECT_580MB HTTP/1.1" 206 0 - Hadoop 2.7.3.2.5.3.0-37, aws-sdk-java/1.10.6 Linux/4.4.0-97-generic Java_HotSpot(TM)_64-Bit_Server_VM/25.77-b03/1.8.0_77

We see ~40 such ERRORs (each GET requests ~580 MB object) prior to the RGW crush:

2018-07-13 05:21:43.993778 7fcce6575700  1 ====== starting new request req=0x7fcce656f2c0 =====
2018-07-13 05:22:16.137676 7fcce6575700 -1 /build/ceph-12.2.5/src/common/buffer.cc: In function 'void ceph::buffer::list::append(const ceph::buffer::ptr&, unsigned int, unsigned int)' thread 7fcce6575700 time 2018-07-13 05:22:16.135271
/build/ceph-12.2.5/src/common/buffer.cc: 1967: FAILED assert(len+off <= bp.length())

 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7fcd5b7aab72]
 2: (ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int, unsigned int)+0x118) [0x7fcd64993cf8]
 3: (RGWPutObj_ObjStore::get_data(ceph::buffer::list&)+0xde) [0x5597153dd06e]
 4: (RGWPutObj_ObjStore_S3::get_data(ceph::buffer::list&)+0xe) [0x55971541f5be]
 5: (RGWPutObj::execute()+0xe9b) [0x55971530e3cb]
 6: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, bool)+0x165) [0x559715331cb5]
 7: (process_request(RGWRados*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, int*)+0x1dbc) [0x559715333e3c]
 8: (RGWCivetWebFrontend::process(mg_connection*)+0x38f) [0x5597151cc3ff]
 9: (()+0x1e6319) [0x559715206319]
 10: (()+0x1e7ce9) [0x559715207ce9]
 11: (()+0x76ba) [0x7fcd646c86ba]
 12: (clone()+0x6d) [0x7fcd59dd041d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
...
...
...
 -9615> 2018-07-13 05:21:43.993778 7fcce6575700  1 ====== starting new request req=0x7fcce656f2c0 =====
 -9614> 2018-07-13 05:21:43.993802 7fcce6575700  2 req 448466:0.000024::PUT /BUCKET/PATH/OBJECT_35MB::initializing for trans_id = tx00000000000000006d7d2-005b4836e7-a22ee-default
 -9613> 2018-07-13 05:21:43.993858 7fcce6575700  2 req 448466:0.000081:s3:PUT /BUCKET/PATH/OBJECT_35MB::getting op 1
 -9612> 2018-07-13 05:21:43.993873 7fcce6575700  2 req 448466:0.000095:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:verifying requester
 -9611> 2018-07-13 05:21:43.994148 7fcce6575700  2 req 448466:0.000370:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:normalizing buckets and tenants
 -9610> 2018-07-13 05:21:43.994156 7fcce6575700  2 req 448466:0.000378:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:init permissions
 -9609> 2018-07-13 05:21:43.994246 7fcce6575700  1 -- 10.195.17.12:0/171535574 --> 10.195.17.20:6804/13437 -- osd_op(unknown.0.0:14249780 4.3 4:cc87b6a2:root::BUCKET01:head [call version.read,getxattrs,stat] snapc 0=[] ondisk+read+known_if_redirected e2230) v8 -- 0x55971dc5e680 con 0
 -9606> 2018-07-13 05:21:43.994934 7fcce6575700  1 -- 10.195.17.12:0/171535574 --> 10.195.17.20:6804/13437 -- osd_op(unknown.0.0:14249781 4.3 4:cc87b6a2:root::BUCKET01:head [call version.check_conds,call version.read,read 0~524288] snapc 0=[] ondisk+read+known_if_redirected e2230) v8 -- 0x55971774a340 con 0
 -9603> 2018-07-13 05:21:43.995333 7fcce6575700  1 -- 10.195.17.12:0/171535574 --> 10.195.17.20:6804/13437 -- osd_op(unknown.0.0:14249782 4.3 4:ccf42def:root::.bucket.meta.BUCKET01%3ad644765c-1705-49b2-9609-a8511d3c4fed.311938.82:head [call version.read,getxattrs,stat] snapc 0=[] ondisk+read+known_if_redirected e2230) v8 -- 0x55973938f6c0 con 0
 -9600> 2018-07-13 05:21:43.995657 7fcce6575700  1 -- 10.195.17.12:0/171535574 --> 10.195.17.20:6804/13437 -- osd_op(unknown.0.0:14249783 4.3 4:ccf42def:root::.bucket.meta.BUCKET01%3ad644765c-1705-49b2-9609-a8511d3c4fed.311938.82:head [call version.check_conds,call version.read,read 0~524288] snapc 0=[] ondisk+read+known_if_redirected e2230) v8 -- 0x55971811a000 con 0
 -9597> 2018-07-13 05:21:43.996145 7fcce6575700  2 req 448466:0.002368:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:recalculating target
 -9596> 2018-07-13 05:21:43.996170 7fcce6575700  2 req 448466:0.002372:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:reading permissions
 -9595> 2018-07-13 05:21:43.996174 7fcce6575700  2 req 448466:0.002397:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:init op
 -9594> 2018-07-13 05:21:43.996176 7fcce6575700  2 req 448466:0.002399:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:verifying op mask
 -9593> 2018-07-13 05:21:43.996178 7fcce6575700  2 req 448466:0.002400:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:verifying op permissions
 -9592> 2018-07-13 05:21:43.996180 7fcce6575700  5 Searching permissions for identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=user01, acct_name=user01, subuser=, perm_mask=15, is_admin=0) mask=50
 -9591> 2018-07-13 05:21:43.996226 7fcce6575700  5 Searching permissions for uid=user01
 -9590> 2018-07-13 05:21:43.996285 7fcce6575700  5 Found permission: 15
 -9589> 2018-07-13 05:21:43.996288 7fcce6575700  5 Searching permissions for group=1 mask=50
 -9588> 2018-07-13 05:21:43.996289 7fcce6575700  5 Permissions for group not found
 -9587> 2018-07-13 05:21:43.996290 7fcce6575700  5 Searching permissions for group=2 mask=50
 -9586> 2018-07-13 05:21:43.996291 7fcce6575700  5 Permissions for group not found
 -9585> 2018-07-13 05:21:43.996292 7fcce6575700  5 -- Getting permissions done for identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=user01, acct_name=user01, subuser=, perm_mask=15, is_admin=0), owner=user01, perm=2
 -9584> 2018-07-13 05:21:43.996294 7fcce6575700  2 req 448466:0.002516:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:verifying op params
 -9583> 2018-07-13 05:21:43.996296 7fcce6575700  2 req 448466:0.002518:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:pre-executing
 -9582> 2018-07-13 05:21:43.996298 7fcce6575700  2 req 448466:0.002520:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:executing
 -9581> 2018-07-13 05:21:43.997115 7fcce6575700  1 -- 10.195.17.12:0/171535574 --> 10.195.17.14:6831/27931 -- osd_op(unknown.0.0:14249784 9.5 9:b67a1efd:::d644765c-1705-49b2-9609-a8511d3c4fed.165377.2__multipart_OBJECT_35MB.meta:head [getxattrs,stat] snapc 0=[] ondisk+read+known_if_redirected e2230) v8 -- 0x55971811d740 con 0
...
...
...
     0> 2018-07-13 05:22:16.137676 7fcce6575700 -1 /build/ceph-12.2.5/src/common/buffer.cc: In function 'void ceph::buffer::list::append(const ceph::buffer::ptr&, unsigned int, unsigned int)' thread 7fcce6575700 time 2018-07-13 05:22:16.135271
/build/ceph-12.2.5/src/common/buffer.cc: 1967: FAILED assert(len+off <= bp.length())

 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7fcd5b7aab72]
 2: (ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int, unsigned int)+0x118) [0x7fcd64993cf8]
 3: (RGWPutObj_ObjStore::get_data(ceph::buffer::list&)+0xde) [0x5597153dd06e]
 4: (RGWPutObj_ObjStore_S3::get_data(ceph::buffer::list&)+0xe) [0x55971541f5be]
 5: (RGWPutObj::execute()+0xe9b) [0x55971530e3cb]
 6: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, bool)+0x165) [0x559715331cb5]
 7: (process_request(RGWRados*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, int*)+0x1dbc) [0x559715333e3c]
 8: (RGWCivetWebFrontend::process(mg_connection*)+0x38f) [0x5597151cc3ff]
 9: (()+0x1e6319) [0x559715206319]
 10: (()+0x1e7ce9) [0x559715207ce9]
 11: (()+0x76ba) [0x7fcd646c86ba]
 12: (clone()+0x6d) [0x7fcd59dd041d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-client.rgw.ceph-node-01.log
--- end dump of recent events ---
2018-07-13 05:22:16.176189 7fcce6575700 -1 *** Caught signal (Aborted) **
 in thread 7fcce6575700 thread_name:civetweb-worker

Any trick to enable rgw_dynamic_resharding only during offpeah hours? Is 'rgw_dynamic_resharding' a injectable setting with no need to restart RGWs? So far, I only mananged to change it via configuration file with RGW service restart.

Regarding `ERROR: flush_read_list(): d->client_cb->handle_data() returned -5`, I'll try to increaste timeouts on nginx side.

Any help with this error `/build/ceph-12.2.5/src/common/buffer.cc: 1967: FAILED assert(len+off <= bp.length())` which seems to directly impact RGW service state ? Could it be caused by lot of GET requests for big files ? It happens only after the flood of `ERROR: flush_read_list()`.

Thanks
Jakub

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com