Re: Luminous 12.2.5 - crushable RGW

Ch Wan <xmu.wc.2007@xxxxxxxxx> · Wed, 24 Oct 2018 18:46:44 +0800

Hi. I encounter this problem these days.The clients get timeout exception while rgw doing resharding.
Could we decrease the impact of dynamic resharding by adjusting some configurations, or just increase the timeout threshold at client-side?

Jakub Jaszewski <jaszewski.jakub@xxxxxxxxx> 于2018年7月16日周一 下午11:25写道：
Hi,
We run 5 RADOS Gateways on Luminous 12.2.5 as upstream servers in nginx active-active setup, based on keepalived.
Cluster is 12x Ceph nodes (16x 10TB OSD(bluestore) per node, 2x 10Gb network link shared by access and cluster networks), RGW pool is EC 9+3.

We recently noticed below entries in RGW logs:

2018-07-11 06:19:13.726392 7f2eeed46700  1 ====== starting new request req=0x7f2eeed402c0 =====
2018-07-11 06:19:13.871358 7f2eeed46700  0 NOTICE: resharding operation on bucket index detected, blocking
2018-07-11 06:19:58.953816 7f2eeed46700  0 block_while_resharding ERROR: bucket is still resharding, please retry
2018-07-11 06:19:58.959424 7f2eeed46700  0 NOTICE: resharding operation on bucket index detected, blocking
2018-07-11 06:20:44.088045 7f2eeed46700  0 block_while_resharding ERROR: bucket is still resharding, please retry
2018-07-11 06:20:44.090664 7f2eeed46700  0 NOTICE: resharding operation on bucket index detected, blocking
2018-07-11 06:21:29.141182 7f2eeed46700  0 block_while_resharding ERROR: bucket is still resharding, please retry
2018-07-11 06:21:29.146598 7f2eeed46700  0 NOTICE: resharding operation on bucket index detected, blocking
2018-07-11 06:22:14.178369 7f2eeed46700  0 block_while_resharding ERROR: bucket is still resharding, please retry
2018-07-11 06:22:14.181697 7f2eeed46700  0 NOTICE: resharding operation on bucket index detected, blocking
2018-07-11 06:22:34.199763 7f2eeed46700  1 ====== req done req=0x7f2eeed402c0 op status=0 http_status=200 ======
2018-07-11 06:22:34.199851 7f2eeed46700  1 civetweb: 0x5599a1158000: 10.195.17.6 - - [11/Jul/2018:06:10:11 +0000] "PUT /BUCKET/PATH/OBJECT?partNumber=2&uploadId=2~ol_fQw_u7eKRjuP1qVwnj5V12GxDYXu HTTP/1.1" 200 0 - -

Causing 'upstream timed out (110: Connection timed out) while reading response header from upstream' errors and 504 response code on nginx side due to 30 seconds timeout.

Other recurring log entries look like:

2018-07-11 06:20:47.407632 7f2e97c98700  1 ====== starting new request req=0x7f2e97c922c0 =====
2018-07-11 06:20:47.412455 7f2e97c98700  0 NOTICE: resharding operation on bucket index detected, blocking
2018-07-11 06:21:32.424983 7f2e97c98700  0 block_while_resharding ERROR: bucket is still resharding, please retry
2018-07-11 06:21:32.426597 7f2e97c98700  0 NOTICE: resharding operation on bucket index detected, blocking
2018-07-11 06:22:17.444467 7f2e97c98700  0 block_while_resharding ERROR: bucket is still resharding, please retry
2018-07-11 06:22:17.492217 7f2e97c98700  0 NOTICE: resharding operation on bucket index detected, blocking

2018-07-11 06:22:32.495254 7f2e97c98700  0 ERROR: update_bucket_id() new_bucket_id=d644765c-1705-49b2-9609-a8511d3c4fed.151639.105 returned r=-125
2018-07-11 06:22:32.495386 7f2e97c98700  0 WARNING: set_req_state_err err_no=125 resorting to 500

2018-07-11 06:22:32.495509 7f2e97c98700  1 ====== req done req=0x7f2e97c922c0 op status=-125 http_status=500 ======
2018-07-11 06:22:32.495569 7f2e97c98700  1 civetweb: 0x5599a14f4000: 10.195.17.6 - - [11/Jul/2018:06:19:25 +0000] "POST PUT /BUCKET/PATH/OBJECT?uploads HTTP/1.1" 500 0 - -

To avoid 504 & 500 responses we disabled dynamic resharding via 'rgw_dynamic_resharding = false'. Not sure if setting nginx 'proxy_read_timeout' option to value higher than bucket resharding time is a good idea.

Once done, 'block_while_resharding ERROR: bucket is still resharding, please retry' disappeared from RGW logs, however, another ERROR is now logged and then RGWs catch singal Aborted and get restarted by systemd:

2018-07-13 05:27:31.149618 7f7eb72c7700  1 ====== starting new request req=0x7f7eb72c12c0 =====
2018-07-13 05:27:52.593413 7f7eb72c7700  0 ERROR: flush_read_list(): d->client_cb->handle_data() returned -5
2018-07-13 05:27:52.594040 7f7eb72c7700  1 ====== req done req=0x7f7eb72c12c0 op status=-5 http_status=206 ======
2018-07-13 05:27:52.594633 7f7eb72c7700  1 civetweb: 0x55ab3171b000: 10.195.17.6 - - [13/Jul/2018:05:24:28 +0000] "GET /BUCKET/PATH/OBJECT_580MB HTTP/1.1" 206 0 - Hadoop 2.7.3.2.5.3.0-37, aws-sdk-java/1.10.6 Linux/4.4.0-97-generic Java_HotSpot(TM)_64-Bit_Server_VM/25.77-b03/1.8.0_77

We see ~40 such ERRORs (each GET requests ~580 MB object) prior to the RGW crush:

2018-07-13 05:21:43.993778 7fcce6575700  1 ====== starting new request req=0x7fcce656f2c0 =====
2018-07-13 05:22:16.137676 7fcce6575700 -1 /build/ceph-12.2.5/src/common/buffer.cc: In function 'void ceph::buffer::list::append(const ceph::buffer::ptr&, unsigned int, unsigned int)' thread 7fcce6575700 time 2018-07-13 05:22:16.135271
/build/ceph-12.2.5/src/common/buffer.cc: 1967: FAILED assert(len+off <= bp.length())

 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7fcd5b7aab72]
 2: (ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int, unsigned int)+0x118) [0x7fcd64993cf8]
 3: (RGWPutObj_ObjStore::get_data(ceph::buffer::list&)+0xde) [0x5597153dd06e]
 4: (RGWPutObj_ObjStore_S3::get_data(ceph::buffer::list&)+0xe) [0x55971541f5be]
 5: (RGWPutObj::execute()+0xe9b) [0x55971530e3cb]
 6: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, bool)+0x165) [0x559715331cb5]
 7: (process_request(RGWRados*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, int*)+0x1dbc) [0x559715333e3c]
 8: (RGWCivetWebFrontend::process(mg_connection*)+0x38f) [0x5597151cc3ff]
 9: (()+0x1e6319) [0x559715206319]
 10: (()+0x1e7ce9) [0x559715207ce9]
 11: (()+0x76ba) [0x7fcd646c86ba]
 12: (clone()+0x6d) [0x7fcd59dd041d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
...
...
...
 -9615> 2018-07-13 05:21:43.993778 7fcce6575700  1 ====== starting new request req=0x7fcce656f2c0 =====
 -9614> 2018-07-13 05:21:43.993802 7fcce6575700  2 req 448466:0.000024::PUT /BUCKET/PATH/OBJECT_35MB::initializing for trans_id = tx00000000000000006d7d2-005b4836e7-a22ee-default
 -9613> 2018-07-13 05:21:43.993858 7fcce6575700  2 req 448466:0.000081:s3:PUT /BUCKET/PATH/OBJECT_35MB::getting op 1
 -9612> 2018-07-13 05:21:43.993873 7fcce6575700  2 req 448466:0.000095:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:verifying requester
 -9611> 2018-07-13 05:21:43.994148 7fcce6575700  2 req 448466:0.000370:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:normalizing buckets and tenants
 -9610> 2018-07-13 05:21:43.994156 7fcce6575700  2 req 448466:0.000378:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:init permissions
 -9609> 2018-07-13 05:21:43.994246 7fcce6575700  1 -- 10.195.17.12:0/171535574 --> 10.195.17.20:6804/13437 -- osd_op(unknown.0.0:14249780 4.3 4:cc87b6a2:root::BUCKET01:head [call version.read,getxattrs,stat] snapc 0=[] ondisk+read+known_if_redirected e2230) v8 -- 0x55971dc5e680 con 0
 -9606> 2018-07-13 05:21:43.994934 7fcce6575700  1 -- 10.195.17.12:0/171535574 --> 10.195.17.20:6804/13437 -- osd_op(unknown.0.0:14249781 4.3 4:cc87b6a2:root::BUCKET01:head [call version.check_conds,call version.read,read 0~524288] snapc 0=[] ondisk+read+known_if_redirected e2230) v8 -- 0x55971774a340 con 0
 -9603> 2018-07-13 05:21:43.995333 7fcce6575700  1 -- 10.195.17.12:0/171535574 --> 10.195.17.20:6804/13437 -- osd_op(unknown.0.0:14249782 4.3 4:ccf42def:root::.bucket.meta.BUCKET01%3ad644765c-1705-49b2-9609-a8511d3c4fed.311938.82:head [call version.read,getxattrs,stat] snapc 0=[] ondisk+read+known_if_redirected e2230) v8 -- 0x55973938f6c0 con 0
 -9600> 2018-07-13 05:21:43.995657 7fcce6575700  1 -- 10.195.17.12:0/171535574 --> 10.195.17.20:6804/13437 -- osd_op(unknown.0.0:14249783 4.3 4:ccf42def:root::.bucket.meta.BUCKET01%3ad644765c-1705-49b2-9609-a8511d3c4fed.311938.82:head [call version.check_conds,call version.read,read 0~524288] snapc 0=[] ondisk+read+known_if_redirected e2230) v8 -- 0x55971811a000 con 0
 -9597> 2018-07-13 05:21:43.996145 7fcce6575700  2 req 448466:0.002368:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:recalculating target
 -9596> 2018-07-13 05:21:43.996170 7fcce6575700  2 req 448466:0.002372:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:reading permissions
 -9595> 2018-07-13 05:21:43.996174 7fcce6575700  2 req 448466:0.002397:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:init op
 -9594> 2018-07-13 05:21:43.996176 7fcce6575700  2 req 448466:0.002399:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:verifying op mask
 -9593> 2018-07-13 05:21:43.996178 7fcce6575700  2 req 448466:0.002400:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:verifying op permissions
 -9592> 2018-07-13 05:21:43.996180 7fcce6575700  5 Searching permissions for identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=user01, acct_name=user01, subuser=, perm_mask=15, is_admin=0) mask=50
 -9591> 2018-07-13 05:21:43.996226 7fcce6575700  5 Searching permissions for uid=user01
 -9590> 2018-07-13 05:21:43.996285 7fcce6575700  5 Found permission: 15
 -9589> 2018-07-13 05:21:43.996288 7fcce6575700  5 Searching permissions for group=1 mask=50
 -9588> 2018-07-13 05:21:43.996289 7fcce6575700  5 Permissions for group not found
 -9587> 2018-07-13 05:21:43.996290 7fcce6575700  5 Searching permissions for group=2 mask=50
 -9586> 2018-07-13 05:21:43.996291 7fcce6575700  5 Permissions for group not found
 -9585> 2018-07-13 05:21:43.996292 7fcce6575700  5 -- Getting permissions done for identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=user01, acct_name=user01, subuser=, perm_mask=15, is_admin=0), owner=user01, perm=2
 -9584> 2018-07-13 05:21:43.996294 7fcce6575700  2 req 448466:0.002516:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:verifying op params
 -9583> 2018-07-13 05:21:43.996296 7fcce6575700  2 req 448466:0.002518:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:pre-executing
 -9582> 2018-07-13 05:21:43.996298 7fcce6575700  2 req 448466:0.002520:s3:PUT /BUCKET/PATH/OBJECT_35MB:put_obj:executing
 -9581> 2018-07-13 05:21:43.997115 7fcce6575700  1 -- 10.195.17.12:0/171535574 --> 10.195.17.14:6831/27931 -- osd_op(unknown.0.0:14249784 9.5 9:b67a1efd:::d644765c-1705-49b2-9609-a8511d3c4fed.165377.2__multipart_OBJECT_35MB.meta:head [getxattrs,stat] snapc 0=[] ondisk+read+known_if_redirected e2230) v8 -- 0x55971811d740 con 0
...
...
...
     0> 2018-07-13 05:22:16.137676 7fcce6575700 -1 /build/ceph-12.2.5/src/common/buffer.cc: In function 'void ceph::buffer::list::append(const ceph::buffer::ptr&, unsigned int, unsigned int)' thread 7fcce6575700 time 2018-07-13 05:22:16.135271
/build/ceph-12.2.5/src/common/buffer.cc: 1967: FAILED assert(len+off <= bp.length())

 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x7fcd5b7aab72]
 2: (ceph::buffer::list::append(ceph::buffer::ptr const&, unsigned int, unsigned int)+0x118) [0x7fcd64993cf8]
 3: (RGWPutObj_ObjStore::get_data(ceph::buffer::list&)+0xde) [0x5597153dd06e]
 4: (RGWPutObj_ObjStore_S3::get_data(ceph::buffer::list&)+0xe) [0x55971541f5be]
 5: (RGWPutObj::execute()+0xe9b) [0x55971530e3cb]
 6: (rgw_process_authenticated(RGWHandler_REST*, RGWOp*&, RGWRequest*, req_state*, bool)+0x165) [0x559715331cb5]
 7: (process_request(RGWRados*, RGWREST*, RGWRequest*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rgw::auth::StrategyRegistry const&, RGWRestfulIO*, OpsLogSocket*, int*)+0x1dbc) [0x559715333e3c]
 8: (RGWCivetWebFrontend::process(mg_connection*)+0x38f) [0x5597151cc3ff]
 9: (()+0x1e6319) [0x559715206319]
 10: (()+0x1e7ce9) [0x559715207ce9]
 11: (()+0x76ba) [0x7fcd646c86ba]
 12: (clone()+0x6d) [0x7fcd59dd041d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 compressor
   1/ 5 bluestore
   1/ 5 bluefs
   1/ 3 bdev
   1/ 5 kstore
   4/ 5 rocksdb
   4/ 5 leveldb
   4/ 5 memdb
   1/ 5 kinetic
   1/ 5 fuse
   1/ 5 mgr
   1/ 5 mgrc
   1/ 5 dpdk
   1/ 5 eventtrace
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-client.rgw.ceph-node-01.log
--- end dump of recent events ---
2018-07-13 05:22:16.176189 7fcce6575700 -1 *** Caught signal (Aborted) **
 in thread 7fcce6575700 thread_name:civetweb-worker

Any trick to enable rgw_dynamic_resharding only during offpeah hours? Is 'rgw_dynamic_resharding' a injectable setting with no need to restart RGWs? So far, I only mananged to change it via configuration file with RGW service restart.

Regarding `ERROR: flush_read_list(): d->client_cb->handle_data() returned -5`, I'll try to increaste timeouts on nginx side.

Any help with this error `/build/ceph-12.2.5/src/common/buffer.cc: 1967: FAILED assert(len+off <= bp.length())` which seems to directly impact RGW service state ? Could it be caused by lot of GET requests for big files ? It happens only after the flood of `ERROR: flush_read_list()`.

Thanks
Jakub

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com