We are investigating a curious problem with radosgw: We see intermittent timeouts and http connections breaking when streaming video files through the rados gateway. On server 1 we have Ubuntu 13.10 (saucy) with the stock Apache 2.4 and associated fastcgi (and a mon) On server 2 we also have Ubuntu 13.10 but installed the ceph patched Apache 2.2 with fastcgi (and a mon) We can now do the following: wget a 700 MB file from Server 2 ( wget --no-check-certificate https://staging.server2.example.ch/2f/e4491dbfa00c328828bbbc2c8d128a/test2.mp4 -O /dev/null ) Stream the same file trough Chrome from server 1: -> the wget process is interrupted: 2013-12-05 16:32:18 (364 KB/s) - Connection closed at byte 172487549. Retrying. the apache error log shows this: [Thu Dec 05 16:32:50 2013] [error] [client 2001xxxx:6502] FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec) [Thu Dec 05 16:32:50 2013] [error] [client 2001:xxx:6502] FastCGI: incomplete headers (0 bytes) received from server "/var/www/s3gw.fcgi" [Thu Dec 05 16:55:54 2013] [error] [client 123.123.123.123] FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec) [Thu Dec 05 16:55:54 2013] [error] [client 123.123.123.123] Handler for fastcgi-script returned invalid result code 1 [Thu Dec 05 16:56:25 2013] [error] [client 123.123.123.123] FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec) [Thu Dec 05 16:56:25 2013] [error] [client 123.123.123.123] FastCGI: incomplete headers (0 bytes) received from server "/var/www/s3gw.fcgi" radosgw log shows this (from another test, that's why timestamps don't match) 2013-12-05 17:14:02.675911 7f2b9cfeb700 0 ERROR: s->cio->print() returned err=-1 2013-12-05 17:14:02.675942 7f2b9cfeb700 0 ERROR: s->cio->print() returned err=-1 2013-12-05 17:14:02.675946 7f2b9cfeb700 0 ERROR: s->cio->print() returned err=-1 2013-12-05 17:14:02.675948 7f2b9cfeb700 0 ERROR: s->cio->print() returned err=-1 2013-12-05 17:14:02.675973 7f2b9cfeb700 2 req 5:8.176355:s3:GET /2f/e4491dbfa00c328828bbbc2c8d128a/test2.mp4:get_obj:http status=403 2013-12-05 17:14:02.675971 7f2b597fa700 20 rados->get_obj_iterate_cb oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_5 obj-ofs=17301504 read_ofs=0 len=4194304 2013-12-05 17:14:02.676068 7f2b597fa700 20 rados->aio_operate r=0 bl.length=0 2013-12-05 17:14:02.676299 7f2b9cfeb700 1 ====== req done req=0x1d60d40 http_status=403 ====== 2013-12-05 17:14:08.909528 7f2bbcbc0700 2 RGWDataChangesLog::ChangesRenewThread: start 2013-12-05 17:14:30.909803 7f2bbcbc0700 2 RGWDataChangesLog::ChangesRenewThread: start 2013-12-05 17:14:32.321141 7f2bbd3c1700 20 get_obj_aio_completion_cb: io completion ofs=13107200 len=4194304 2013-12-05 17:14:32.321176 7f2bbd3c1700 20 get_obj_aio_completion_cb: io completion ofs=8912896 len=4194304 2013-12-05 17:14:32.321185 7f2bbd3c1700 20 get_obj_aio_completion_cb: io completion ofs=4718592 len=4194304 2013-12-05 17:14:32.321188 7f2bbd3c1700 20 get_obj_aio_completion_cb: io completion ofs=8912896 len=4194304 2013-12-05 17:14:32.321194 7f2bbd3c1700 20 get_obj_aio_completion_cb: io completion ofs=138936320 len=4194304 2013-12-05 17:14:32.321248 7f2b597fa700 20 rados->get_obj_iterate_cb oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_6 obj-ofs=21495808 read_ofs=0 len=4194304 2013-12-05 17:14:32.321281 7f2b22fad700 20 rados->get_obj_iterate_cb oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_38 obj-ofs=155713536 read_ofs=0 len=4194304 2013-12-05 17:14:32.321383 7f2b597fa700 20 rados->aio_operate r=0 bl.length=0 2013-12-05 17:14:32.321436 7f2b597fa700 20 rados->get_obj_iterate_cb oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_7 obj-ofs=25690112 read_ofs=0 len=4194304 2013-12-05 17:14:32.321530 7f2b22fad700 20 rados->aio_operate r=0 bl.length=0 2013-12-05 17:14:32.321672 7f2b597fa700 20 rados->aio_operate r=0 bl.length=0 2013-12-05 17:14:32.321690 7f2b597fa700 20 rados->get_obj_iterate_cb oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_8 obj-ofs=29884416 read_ofs=0 len=4194304 2013-12-05 17:14:32.321779 7f2b597fa700 20 rados->aio_operate r=0 bl.length=0 2013-12-05 17:14:35.635836 7f2bbd3c1700 20 get_obj_aio_completion_cb: io completion ofs=147324928 len=4194304 2013-12-05 17:14:35.635881 7f2bbd3c1700 20 get_obj_aio_completion_cb: io completion ofs=143130624 len=4194304 2013-12-05 17:14:35.635886 7f2bbd3c1700 20 get_obj_aio_completion_cb: io completion ofs=17301504 len=4194304 2013-12-05 17:14:35.635961 7f2b22fad700 20 rados->get_obj_iterate_cb oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_39 obj-ofs=159907840 read_ofs=0 len=4194304 2013-12-05 17:14:35.636133 7f2b22fad700 20 rados->aio_operate r=0 bl.length=0 2013-12-05 17:14:35.636157 7f2b22fad700 20 rados->get_obj_iterate_cb oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_40 obj-ofs=164102144 read_ofs=0 len=4194304 2013-12-05 17:14:35.636226 7f2b22fad700 20 rados->aio_operate r=0 bl.length=0 2013-12-05 17:14:35.636551 7f2bbd3c1700 20 get_obj_aio_completion_cb: io completion ofs=17301504 len=4194304 2013-12-05 17:14:35.636645 7f2b597fa700 20 rados->get_obj_iterate_cb oid=default.40804.6__shadow__W8D84-M3taNmGJG4UCxDxbmNDJqubhP_9 obj-ofs=34078720 read_ofs=0 len=4194304 2013-12-05 17:14:35.636764 7f2b597fa700 20 rados->aio_operate r=0 bl.length=0 2013-12-05 17:14:52.909931 7f2bbcbc0700 2 RGWDataChangesLog::ChangesRenewThread: start 2013-12-05 17:15:06.803464 7f2bc96237c0 20 enqueued request req=0x1d62fe0 2013-12-05 17:15:06.803499 7f2bc96237c0 20 RGWWQ: 2013-12-05 17:15:06.803503 7f2bc96237c0 20 req: 0x1d62fe0 2013-12-05 17:15:06.803511 7f2bc96237c0 10 allocated request req=0x1d64bc0 2013-12-05 17:15:06.803559 7f2b5affd700 20 dequeued request req=0x1d62fe0 2013-12-05 17:15:06.803574 7f2b5affd700 20 RGWWQ: empty 2013-12-05 17:15:06.803692 7f2b5affd700 2 req 7:0.000110::GET /2f/e4491dbfa00c328828bbbc2c8d128a/test2.mp4::initializing 2013-12-05 17:15:06.803702 7f2b5affd700 10 host=xxxxxx rgw_dns_name=xxxxxx Ceph Versions: root@server2:/var/log/apache2# radosgw --version ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7) root@server1:/etc# ceph -s cluster 6b3bd327-2f97-44f6-a8fc-2af8534a2e67 health HEALTH_OK monmap e1: 3 mons at {h0s=[2001:]:6789/0,h1s=[2001:]:6789/0,hxs=[2001:]:6789/0}, election epoch 1270, quorum 0,1,2 server2, server3, server1 osdmap e6645: 24 osds: 24 up, 24 in pgmap v2541337: 7368 pgs: 7368 active+clean; 2602 GB data, 5213 GB used, 61822 GB / 67035 GB avail; 31013KB/s rd, 151KB/s wr, 34op/s mdsmap e1: 0/0/1 up root@server1:/etc# ceph --version ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7) more log files available upon request…. any ideas? cheers Jens-Christian -- SWITCH Jens-Christian Fischer, Peta Solutions Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland phone +41 44 268 15 15, direct +41 44 268 15 71 jens-christian.fischer@xxxxxxxxx |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com