Problem: silently corrupted RadosGW objects caused by slow requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

We have two CEPH clusters running on Dumpling 0.67.11 and some of our "multipart objects" are incompleted. It seems that some slow requests could cause corruption of related S3 objects. Moveover GETs for that objects are working without any error messages. There are only HTTP 200 in logs as well as no information about problems from popular client tools/libs.

 

The situation looks very similiar to described in bug #8269, but we are using fixed 0.67.11 version:  http://tracker.ceph.com/issues/8269

 

Regards,

 

Sławomir Ritter

 

 

 

EXAMPLE#1

 

slow_request

============

2016-02-23 13:49:58.818640 osd.260 10.176.67.27:6800/688083 2119 : [WRN] 4 slow requests, 4 included below; oldest blocked for > 30.727096 secs

2016-02-23 13:49:58.818673 osd.260 10.176.67.27:6800/688083 2120 : [WRN] slow request 30.727096 seconds old, received at 2016-02-23 13:49:28.091460: osd_op(c

lient.47792965.0:185007087 default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2 [writef

ull 0~524288] 10.ce729ebe e107594) v4 currently waiting for subops from [469,9]

 

 

HTTP_500 in apache.log

======================

127.0.0.1 - - [23/Feb/2016:13:49:27 +0100] "PUT /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=56 HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3 Linux/3.13.0-39-generic(syncworker)"

127.0.0.1 - - [23/Feb/2016:13:49:28 +0100] "PUT /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=57 HTTP/1.0" 500 751 "-" "Boto/2.31.1 Python/2.7.3 Linux/3.13.0-39-generic(syncworker)"

127.0.0.1 - - [23/Feb/2016:13:49:58 +0100] "PUT /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=57 HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3 Linux/3.13.0-39-generic(syncworker)"

127.0.0.1 - - [23/Feb/2016:13:49:59 +0100] "PUT /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv?uploadId=b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z&partNumber=58 HTTP/1.0" 200 221 "-" "Boto/2.31.1 Python/2.7.3 Linux/3.13.0-39-generic(syncworker)"

 

 

Empty RADOS object (real size = 0 bytes), list generated basis on MANIFEST

==========================================================================

found  default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.56_2  2097152   ok          2097152       10.7acc9476 (10.1476) [278,142,436] [278,142,436]

found  default.14654.445__multipart_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57 0         diff        4194304       10.4f5be025 (10.25)   [57,310,428]  [57,310,428]

found  default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_1  4194304   ok          4194304       10.81191602 (10.1602) [441,109,420] [441,109,420]

found  default.14654.445__shadow_c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv.b73N3OmW4OhCjDYSR-RTkZNNIKA1C9Z.57_2  2097152   ok          2097152       10.ce729ebe (10.1ebe) [260,469,9]   [260,469,9]

 

 

"Silent" GETs

=============

# object size from headers

$ s3 -u head video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv                                                               Content-Type: binary/octet-stream

Content-Length: 641775701

Server: nginx

 

# but GETs only 637581397 (641775701 - missing 4194304 = 637581397)

$ s3 -u get video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv > /tmp/test

$  ls -al /tmp/test

-rw-r--r-- 1 root root 637581397 Feb 23 17:05 /tmp/test

 

# no error in logs

127.0.0.1 - - [23/Feb/2016:17:05:00 +0100] "GET /video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv HTTP/1.0" 200 637581711 "-" "Mozilla/4.0 (Compatible; s3; libs3 2.0; Linux x86_64)"

 

# wget - retry for missing part, but there is no missing part, so it GETs head/tail of the file again....

$ wget http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv

--2016-02-23 17:10:11--  http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv

Connecting to 127.0.0.1:88... connected.

HTTP request sent, awaiting response... 200 OK

Length: 641775701 (612M) [binary/octet-stream]

Saving to: `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv'

 

99% [==================================================================================================================> ] 637,581,397 63.9M/s   in 9.5s

 

2016-02-23 17:10:20 (64.1 MB/s) - Connection closed at byte 637581397. Retrying.

 

--2016-02-23 17:10:21--  (try: 2)  http://127.0.0.1:88/video-shbc/c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv

Connecting to 127.0.0.1:88... connected.

HTTP request sent, awaiting response... 206 Partial Content

Length: 641775701 (612M), 4194304 (4.0M) remaining [binary/octet-stream]

Saving to: `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv'

 

100%[+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>] 641,775,701 --.-K/s   in 0.007s

 

2016-02-23 17:10:21 (601 MB/s) - `c9f8db1b-cee2-4ec8-8fb3-8b4bc7585d80.1456231572.877051.ismv' saved [641775701/641775701]

 

 

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux