Re: RGW flush_read_list error

Casey Bodley <cbodley@xxxxxxxxxx> · Wed, 11 Oct 2017 12:27:58 -0400

Hi Travis,

This is reporting an error when sending data back to the client. 
Generally it means that the client timed out and closed the connection. 
Are you also seeing failures on the client side?

Casey

On 10/10/2017 06:45 PM, Travis Nielsen wrote:
In Luminous 12.2.1, when running a GET on a large (1GB file) repeatedly
for an hour from RGW, the following error was hit intermittently a number
of times. The first error was hit after 45 minutes and then the error
happened frequently for the remainder of the test.

ERROR: flush_read_list(): d->client_cb->handle_data() returned -5

Here is some more context from the rgw log around one of the failures.

2017-10-10 18:20:32.321681 I | rgw: 2017-10-10 18:20:32.321643
7f8929f41700 1 civetweb: 0x55bd25899000: 10.32.0.1 - -
[10/Oct/2017:18:19:07 +0000] "GET /bucket100/testfile.tst HTTP/1.1" 1 0 -
aws-sdk-java/1.9.0 Linux/4.4.0-93-generic
OpenJDK_64-Bit_Server_VM/25.131-b11/1.8.0_131
2017-10-10 18:20:32.383855 I | rgw: 2017-10-10 18:20:32.383786
7f8924736700 1 ====== starting new request req=0x7f892472f140 =====
2017-10-10 18:20:46.605668 I | rgw: 2017-10-10 18:20:46.605576
7f894af83700 0 ERROR: flush_read_list(): d->client_cb->handle_data()
returned -5
2017-10-10 18:20:46.605934 I | rgw: 2017-10-10 18:20:46.605914
7f894af83700 1 ====== req done req=0x7f894af7c140 op status=-5
http_status=200 ======
2017-10-10 18:20:46.606249 I | rgw: 2017-10-10 18:20:46.606225
7f8924736700 0 ERROR: flush_read_list(): d->client_cb->handle_data()
returned -5

I don't see anything else standing out in the log. The object store was
configured with an erasure-coded data pool with k=2 and m=1.

There are a number of threads around this, but I don't see a resolution.
Is there a tracking issue for this?
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007756.ht
ml
https://www.spinics.net/lists/ceph-users/msg16117.html
https://www.spinics.net/lists/ceph-devel/msg37657.html

Here's our tracking Rook issue.
https://github.com/rook/rook/issues/1067

Thanks,
Travis

On 10/10/17, 3:05 PM, "ceph-users on behalf of Jack"
<ceph-users-bounces@xxxxxxxxxxxxxx on behalf of ceph@xxxxxxxxxxxxxx> wrote:

Hi,

I would like some information about the following

Let say I have a running cluster, with 4 OSDs: 2 SSDs, and 2 HDDs
My single pool has size=3, min_size=2

For a write-only pattern, I thought I would get SSDs performance level,
because the write would be acked as soon as min_size OSDs acked

But I am right ?

(the same setup could involve some high latency OSDs, in the case of
country-level cluster)
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ceph
.com%2Flistinfo.cgi%2Fceph-users-ceph.com&data=02%7C01%7CTravis.Nielsen%40
quantum.com%7C16f668da252f4e6f355308d5102b09c1%7C322a135f14fb4d72aede12227
2134ae0%7C1%7C0%7C636432699404298770&sdata=tmIMMyQ7ia%2FVmHrSGcF9t4sMpt2bj
dexriEhEg3XUGU%3D&reserved=0
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com