I actually need to see what happens before it starts looping. What does 'ceph health' show? Yehuda On Wed, Jan 22, 2014 at 8:38 AM, Graeme Lambert <glambert@xxxxxxxxxxx> wrote: > Hi, > > Following discussions with people in the IRC I set debug_ms and this is what > is being looped over and over when one of them is stuck: > http://pastebin.com/KVcpAeYT > > Regarding the modules, apache version is 2.2.22-2precise.ceph and the > fastcgi mod version is 2.4.7~0910052141-2~bpo70+1.ceph. > > Best regards > > Graeme > > > On 22/01/14 16:28, Yehuda Sadeh wrote: > > On Wed, Jan 22, 2014 at 8:05 AM, Graeme Lambert <glambert@xxxxxxxxxxx> > wrote: > > Hi, > > I'm using the aws-sdk-for-php classes for the Ceph RADOS gateway but I'm > getting an intermittent issue with the uploading files. > > I'm attempting to upload an array of objects to Ceph one by one using the > create_object() function. It appears to stop randomly when attempting to do > them all, it could stop at the first one, in between or the last one, there > is no pattern to it that I can see. > > I'm not getting any PHP errors that indicate an issue and equally there are > no exceptions being caught. > > In the radosgw log file, at the time it appears stuck I get: > > 2014-01-22 15:39:21.656763 7fac44fe1700 1 ====== starting new request > req=0x2417c30 ===== > > And then sometimes I see: > > 2014-01-22 15:40:42.490485 7fac99ff9700 1 heartbeat_map is_healthy > 'RGWProcess::m_tp thread 0x7fac51ffb700' had timed out after 600 > > repeated over and over again. > > When those messages are appearing, Apache's error log shows: > > [Wed Jan 22 15:43:11 2014] [error] [client 172.16.2.149] FastCGI: comm with > server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec), referer: > https://[server]/[path] > > equally over and over again. > > I have restarted apache, radosgw, all Ceph OSDs and ceph-mon processes and > still no joy with this. > > Can anyone advise on where I'm going wrong with this? > > Which fastcgi module are you using? Can you provide a log with 'debug > ms = 1' for a failing request? Usually that kind of message means that > it's waiting for the osd to response, which might point at an > unhealthy cluster. > > Yehuda > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com