Bad drive caused radosgw to timeout with http 500s

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Ceph-Users,

I was testing our rados gateway and after a few hours rgw started sending http 500 responses for certain uploads. I did some digging and found that a HDD died. The OSD was marked out, but not after a short rgw outage. Start to finish was 60 to 120 seconds.

I have a few questions;

1) Fastcgi timed out after 30 seconds. If I raise the timeout to 120 seconds, will that protect me from future HDD failures? 
	Example of the error.log from apache:

	[error] [client 10.194.255.14] FastCGI: incomplete headers (0 bytes) received from server "/var/www/s3gw.fcgi"
	[error] [client 10.194.255.1] FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)

2) Why did it take so long for Ceph to recover? 

3) Anything I can to improve HDD failure resiliency?

Thank you. 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux