Re: radosgw flush_read_list(): d->client_c->handle_data() returned -5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 16-02-27 06:09, Yehuda Sadeh-Weinraub wrote:
On Wed, Feb 24, 2016 at 5:48 PM, Ben Hines <bhines@xxxxxxxxx> wrote:
Any idea what is going on here? I get these intermittently, especially with
very large file.

The client is doing RANGE requests on this >51 GB file, incrementally
fetching later chunks.

2016-02-24 16:30:59.669561 7fd33b7fe700  1 ====== starting new request
req=0x7fd32c0879c0 =====
2016-02-24 16:30:59.669675 7fd33b7fe700  2 req 3648804:0.000114::GET
/<redacted>/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg::initializing for
trans_id = tx00000000000000037ad24-0056ce4b43-259914b-default
2016-02-24 16:30:59.669687 7fd33b7fe700 10 host=<redacted>
2016-02-24 16:30:59.669757 7fd33b7fe700 10
s->object=<redacted>/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg
s->bucket=<redacted>
2016-02-24 16:30:59.669767 7fd33b7fe700  2 req 3648804:0.000206:s3:GET
/<redacted>/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg::getting op
2016-02-24 16:30:59.669776 7fd33b7fe700  2 req 3648804:0.000215:s3:GET
/<redacted>/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:authorizing
2016-02-24 16:30:59.669785 7fd33b7fe700  2 req 3648804:0.000224:s3:GET
/<redacted>/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:reading
permissions
2016-02-24 16:30:59.673797 7fd33b7fe700 10 manifest: total_size =
50346000384
2016-02-24 16:30:59.673841 7fd33b7fe700  2 req 3648804:0.004280:s3:GET
/<redacted>/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:init op
2016-02-24 16:30:59.673867 7fd33b7fe700 10 cache get:
name=.users.uid+<redacted> : hit
2016-02-24 16:30:59.673881 7fd33b7fe700 10 cache get:
name=.users.uid+<redacted> : hit
2016-02-24 16:30:59.673921 7fd33b7fe700  2 req 3648804:0.004360:s3:GET
/<redacted>/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying
op mask
2016-02-24 16:30:59.673929 7fd33b7fe700  2 req 3648804:0.004369:s3:GET
/<redacted>/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying
op permissions
2016-02-24 16:30:59.673941 7fd33b7fe700  5 Searching permissions for
uid=anonymous mask=49
2016-02-24 16:30:59.673944 7fd33b7fe700  5 Permissions for user not found
2016-02-24 16:30:59.673946 7fd33b7fe700  5 Searching permissions for group=1
mask=49
2016-02-24 16:30:59.673949 7fd33b7fe700  5 Found permission: 1
2016-02-24 16:30:59.673951 7fd33b7fe700  5 Searching permissions for group=2
mask=49
2016-02-24 16:30:59.673953 7fd33b7fe700  5 Permissions for group not found
2016-02-24 16:30:59.673955 7fd33b7fe700  5 Getting permissions id=anonymous
owner=<redacted> perm=1
2016-02-24 16:30:59.673957 7fd33b7fe700 10  uid=anonymous requested perm
(type)=1, policy perm=1, user_perm_mask=15, acl perm=1
2016-02-24 16:30:59.673961 7fd33b7fe700  2 req 3648804:0.004400:s3:GET
/<redacted>/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying
op params
2016-02-24 16:30:59.673965 7fd33b7fe700  2 req 3648804:0.004404:s3:GET
/<redacted>/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:executing
2016-02-24 16:30:59.674107 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=130023424 stripe_ofs=130023424 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:30:59.674193 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=134217728 stripe_ofs=134217728 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:30:59.674317 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=138412032 stripe_ofs=138412032 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:30:59.674433 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=142606336 stripe_ofs=142606336 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.046110 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=146800640 stripe_ofs=146800640 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.150966 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=150994944 stripe_ofs=150994944 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.151118 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=155189248 stripe_ofs=155189248 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.161000 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=157286400 stripe_ofs=157286400 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.199553 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=161480704 stripe_ofs=161480704 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.278308 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=165675008 stripe_ofs=165675008 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.312306 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=169869312 stripe_ofs=169869312 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.751626 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=174063616 stripe_ofs=174063616 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.833570 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=178257920 stripe_ofs=178257920 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.871774 7fd33b7fe700  0 ERROR: flush_read_list():
d->client_c->handle_data() returned -5

Maybe add 'debug ms = 1'?

Yehuda

I am having similar problem. My output (with object/bucket name replaced) with debug ms=1

2016-09-05 21:24:36.338651 7f2d7b766700 1 ====== starting new request req=0x7f2d7b7608a0 ===== 2016-09-05 21:24:36.338807 7f2d7b766700 1 -- 10.169.28.44:0/1477608244 --> 10.169.5.2:6820/6099 -- osd_op(client.20247176.0:1771 17.6d5655ad default.19781482.1_SOMEOBJECT01 [getxattrs,stat] snapc 0=[] ack+read+known_if_redirected e21024) v7 -- ?+0 0x7f3030006650 con 0x7f2e0801b2b0 2016-09-05 21:24:36.339543 7f2d310cc700 1 -- 10.169.28.44:0/1477608244 <== osd.40 10.169.5.2:6820/6099 35 ==== osd_op_reply(1771 default.19781482.1_SOMEOBJECT01 [getxattrs,stat] v0'0 uv101163 ondisk = 0) v7 ==== 247+0+53526 (1646996105 0 3189628511) 0x7f2ff801ede0 con 0x7f2e0801b2b0 2016-09-05 21:24:36.339958 7f2d7b766700 1 -- 10.169.28.44:0/1477608244 --> 10.169.14.45:6844/71802 -- osd_op(client.20247176.0:1772 17.b0f772b2 default.19781482.1__shadow_SOMEOBJECT01.2~ioN9kG4y9fem98jNURGcd-hiOz0rUcY.141_24 [read 0~4194304] snapc 0=[] ack+read+known_if_redirected e21024) v7 -- ?+0 0x7f3030025250 con 0x7f2e7001e580 2016-09-05 21:24:36.339987 7f2d7b766700 1 -- 10.169.28.44:0/1477608244 --> 10.169.5.2:6836/7553 -- osd_op(client.20247176.0:1773 17.40daf8fc default.19781482.1__shadow_SOMEOBJECT01.2~ioN9kG4y9fem98jNURGcd-hiOz0rUcY.141_25 [read 0~4194304] snapc 0=[] ack+read+known_if_redirected e21024) v7 -- ?+0 0x7f3030026020 con 0x7f2ffc003030 2016-09-05 21:24:36.354647 7f2d2ca98700 1 -- 10.169.28.44:0/1477608244 <== osd.71 10.169.14.45:6844/71802 15 ==== osd_op_reply(1596 default.19781482.1__shadow_SOMEOBJECT02.2~mrZUl8RCA1QjnfDAMIFDgbddOQDBMGi.123_42 [read 0~4194304] v0'0 uv101395 ondisk = 0) v7 ==== 258+0+4194304 (2751466370 0 2582022669) 0x7f2e0400d2f0 con 0x7f2e7001e580 2016-09-05 21:24:36.355813 7f2d93f97700 0 ERROR: flush_read_list(): d->client_c->handle_data() returned -5 2016-09-05 21:24:36.356265 7f2d93f97700 0 WARNING: set_req_state_err err_no=5 resorting to 500 2016-09-05 21:24:36.356290 7f2d93f97700 0 ERROR: s->cio->send_content_length() returned err=-5 2016-09-05 21:24:36.356293 7f2d93f97700 0 ERROR: s->cio->print() returned err=-5 2016-09-05 21:24:36.356294 7f2d93f97700 0 ERROR: STREAM_IO(s)->print() returned err=-5 2016-09-05 21:24:36.356299 7f2d93f97700 0 ERROR: STREAM_IO(s)->complete_header() returned err=-5 2016-09-05 21:24:36.356348 7f2d93f97700 1 -- 10.169.28.44:0/1477608244 --> 10.169.5.2:6836/7553 -- osd_op(client.20247176.0:1774 11.cddb167c 2016-09-05-21-default.19781482.1-uber-mysql [append 0~556] snapc 0=[] ack+ondisk+write+known_if_redirected e21024) v7 -- ?+0 0x7f2f7400df40 con 0x7f2ffc003030 2016-09-05 21:24:36.356373 7f2d93f97700 1 ====== req done req=0x7f2d93f918a0 op status=-5 http_status=500 ====== 2016-09-05 21:24:36.356437 7f2d93f97700 1 civetweb: 0x7f2f74001c70: 10.169.26.22 - - [05/Sep/2016:21:24:33 +0000] "GET /SOME_BUCKET/SOMEOBJECT02 HTTP/1.1" 500 0 - aws-cli/1.10.20 Python/2.7.3 Linux/3.18.27-031827-generic botocore/1.4.11

please note that right before failure there is another rgw object mentioned. It looks wrong... I am on 10.2.2

2016-02-24 16:31:00.872480 7fd33b7fe700  0 WARNING: set_req_state_err
err_no=5 resorting to 500
2016-02-24 16:31:00.872561 7fd33b7fe700  0 ERROR: s->cio->print() returned
err=-5
2016-02-24 16:31:00.872567 7fd33b7fe700  0 ERROR: s->cio->print() returned
err=-5
2016-02-24 16:31:00.872571 7fd33b7fe700  0 ERROR: s->cio->print() returned
err=-5
2016-02-24 16:31:00.872592 7fd33b7fe700  0 ERROR: s->cio->complete_header()
returned err=-5
2016-02-24 16:31:00.872616 7fd33b7fe700  2 req 3648804:1.203055:s3:GET
/<redacted>/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:http
status=500
2016-02-24 16:31:00.872629 7fd33b7fe700  1 ====== req done
req=0x7fd32c0879c0 http_status=500 ======

ceph.conf relevant settings:

[client.sm-cephrgw5]
rgw enable ops log = False
rgw frontends = civetweb port=8080 num_threads=75
error_log_file=/var/log/ceph/civetweb.error.log
access_log_file=/var/log/ceph/civetweb.access.log
rgw num rados handles = 10
rgw cache lru size = 30000
debug civetweb = 10
rgw override bucket index max shards = 23
debug rgw = 10

Ceph 9.2.0, cluster is HEALTH_OK

-Ben
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux