Hi,
Following discussions with people in the IRC I set debug_ms and this is what is being looped over and over when one of them is stuck: http://pastebin.com/KVcpAeYT Regarding the modules, apache version is 2.2.22-2precise.ceph and the fastcgi mod version is 2.4.7~0910052141-2~bpo70+1.ceph. Best regards Graeme On Wed, Jan 22, 2014 at 8:05 AM, Graeme Lambert <glambert@xxxxxxxxxxx> wrote:Hi, I'm using the aws-sdk-for-php classes for the Ceph RADOS gateway but I'm getting an intermittent issue with the uploading files. I'm attempting to upload an array of objects to Ceph one by one using the create_object() function. It appears to stop randomly when attempting to do them all, it could stop at the first one, in between or the last one, there is no pattern to it that I can see. I'm not getting any PHP errors that indicate an issue and equally there are no exceptions being caught. In the radosgw log file, at the time it appears stuck I get: 2014-01-22 15:39:21.656763 7fac44fe1700 1 ====== starting new request req=0x2417c30 ===== And then sometimes I see: 2014-01-22 15:40:42.490485 7fac99ff9700 1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7fac51ffb700' had timed out after 600 repeated over and over again. When those messages are appearing, Apache's error log shows: [Wed Jan 22 15:43:11 2014] [error] [client 172.16.2.149] FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec), referer: https://[server]/[path] equally over and over again. I have restarted apache, radosgw, all Ceph OSDs and ceph-mon processes and still no joy with this. Can anyone advise on where I'm going wrong with this?Which fastcgi module are you using? Can you provide a log with 'debug ms = 1' for a failing request? Usually that kind of message means that it's waiting for the osd to response, which might point at an unhealthy cluster. Yehuda |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com