Hello Ceph-Users, I was testing our rados gateway and after a few hours rgw started sending http 500 responses for certain uploads. I did some digging and found that a HDD died. The OSD was marked out, but not after a short rgw outage. Start to finish was 60 to 120 seconds. I have a few questions; 1) Fastcgi timed out after 30 seconds. If I raise the timeout to 120 seconds, will that protect me from future HDD failures? Example of the error.log from apache: [error] [client 10.194.255.14] FastCGI: incomplete headers (0 bytes) received from server "/var/www/s3gw.fcgi" [error] [client 10.194.255.1] FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec) 2) Why did it take so long for Ceph to recover? 3) Anything I can to improve HDD failure resiliency? Thank you. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com