RGW flush_read_list error

Travis Nielsen <Travis.Nielsen@xxxxxxxxxxx> · Tue, 10 Oct 2017 22:45:43 +0000

In Luminous 12.2.1, when running a GET on a large (1GB file) repeatedly
for an hour from RGW, the following error was hit intermittently a number
of times. The first error was hit after 45 minutes and then the error
happened frequently for the remainder of the test.

ERROR: flush_read_list(): d->client_cb->handle_data() returned -5

Here is some more context from the rgw log around one of the failures.

2017-10-10 18:20:32.321681 I | rgw: 2017-10-10 18:20:32.321643
7f8929f41700 1 civetweb: 0x55bd25899000: 10.32.0.1 - -
[10/Oct/2017:18:19:07 +0000] "GET /bucket100/testfile.tst HTTP/1.1" 1 0 -
aws-sdk-java/1.9.0 Linux/4.4.0-93-generic
OpenJDK_64-Bit_Server_VM/25.131-b11/1.8.0_131
2017-10-10 18:20:32.383855 I | rgw: 2017-10-10 18:20:32.383786
7f8924736700 1 ====== starting new request req=0x7f892472f140 =====
2017-10-10 18:20:46.605668 I | rgw: 2017-10-10 18:20:46.605576
7f894af83700 0 ERROR: flush_read_list(): d->client_cb->handle_data()
returned -5
2017-10-10 18:20:46.605934 I | rgw: 2017-10-10 18:20:46.605914
7f894af83700 1 ====== req done req=0x7f894af7c140 op status=-5
http_status=200 ======
2017-10-10 18:20:46.606249 I | rgw: 2017-10-10 18:20:46.606225
7f8924736700 0 ERROR: flush_read_list(): d->client_cb->handle_data()
returned -5

I don't see anything else standing out in the log. The object store was
configured with an erasure-coded data pool with k=2 and m=1.

There are a number of threads around this, but I don't see a resolution.
Is there a tracking issue for this?
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007756.ht
ml
https://www.spinics.net/lists/ceph-users/msg16117.html
https://www.spinics.net/lists/ceph-devel/msg37657.html

Here's our tracking Rook issue.
https://github.com/rook/rook/issues/1067

Thanks,
Travis

On 10/10/17, 3:05 PM, "ceph-users on behalf of Jack"
<ceph-users-bounces@xxxxxxxxxxxxxx on behalf of ceph@xxxxxxxxxxxxxx> wrote:

>Hi,
>
>I would like some information about the following
>
>Let say I have a running cluster, with 4 OSDs: 2 SSDs, and 2 HDDs
>My single pool has size=3, min_size=2
>
>For a write-only pattern, I thought I would get SSDs performance level,
>because the write would be acked as soon as min_size OSDs acked
>
>But I am right ?
>
>(the same setup could involve some high latency OSDs, in the case of
>country-level cluster)
>_______________________________________________
>ceph-users mailing list
>ceph-users@xxxxxxxxxxxxxx
>https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.ceph
>.com%2Flistinfo.cgi%2Fceph-users-ceph.com&data=02%7C01%7CTravis.Nielsen%40
>quantum.com%7C16f668da252f4e6f355308d5102b09c1%7C322a135f14fb4d72aede12227
>2134ae0%7C1%7C0%7C636432699404298770&sdata=tmIMMyQ7ia%2FVmHrSGcF9t4sMpt2bj
>dexriEhEg3XUGU%3D&reserved=0

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com