Re: Problem: silently corrupted RadosGW objects caused by slow requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thnx for contact.

> > 2016-02-23 13:49:58.818640 osd.260 10.176.67.27:6800/688083 2119 : [WRN] 4
> > slow requests, 4 included below; oldest blocked for > 30.727096 secs
> > 2016-02-23 13:49:58.818673 osd.260 10.176.67.27:6800/688083 2120 : [WRN]
> > slow request 30.727096 seconds old, received at 2016-02-23 13:49:28.091460:
> > osd_op(client.47792965.0:185007087 default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2
> > [writefull 0~524288] 10.ce729ebe e107594) v4 currently waiting for subops from
> > [469,9]
> Did these requests ever finish?
There is no more info in ceph.log (any other way to check it?).
...but related RADOS object is complete and it seems that have correct mtime (2016-02-23T12:49:28+00:00 =  time of HTTP_500 and "received time" from slow_req)
.rgw.buckets/default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2 mtime 1456231768, size 2097152   

The previous object also have the same mtime:
.rgw.buckets/default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_1 mtime 1456231768, size 4194304

But first object from this multipart is empty and have other mtime (2016-02-23T12:50:00+00:00  - 22s later, during slow_req and before next HTTP_200 request).
.rgw.buckets/default.14654.445__multipart_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57 mtime 1456231800, size 0
There wasn't slow_req info about this object or its OSDs. It seems that "empty" state has been caused by slow_req on that latter object.

> > 127.0.0.1 - - [23/Feb/2016:13:49:28 +0100] "PUT /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=57 HTTP/1.0" 500 751 "-" "Boto/2.31.1 Python/2.7.3
> > Linux/3.13.0-39-generic(syncworker)" > >
> > 127.0.0.1 - - [23/Feb/2016:13:49:58 +0100] "PUT /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=57 HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3
> > Linux/3.13.0-39-generic(syncworker)"

> Thank you. I think you provided some info here that will hopefully
> allow us to identify the root cause.
We have a lot of such S3-objects with empty or missing RADOS parts, but of course limited logs (rotation).
Right now, we are installing test-cluster. We have methods to release floods of slow_reqs :).

Regards,
SR
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux