Re: radosgw daemon stalls on download of some files

Sebastian <webmaster@xxxxxxxx> · Fri, 29 Nov 2013 16:28:50 +0100

Hi,

thanks for the hint. I tried this again and noticed that the time out message does seem to be unrelated. Here is the log file for a stalling request with debug turned on:
http://pastebin.com/DcQuc9wP

I really cannot really find a real "error" in the log. The download stalls at about 500kb at that point though. Restarting radosgw fixes it for 1 download only, the next one is broken again. But as i said this does not happen for all files. 

Sebastian

On 27.11.2013, at 21:53, Yehuda Sadeh wrote:

> On Wed, Nov 27, 2013 at 4:46 AM, Sebastian <webmaster@xxxxxxxx> wrote:
>> Hi,
>> 
>> we have a setup of 4 Servers running ceph and radosgw. We use it as an internal S3 service for our files. The Servers run Debian Squeeze with Ceph 0.67.4.
>> 
>> The cluster has been running smoothly for quite a while, but we are currently experiencing issues with the radosgw. For some files the HTTP Download just stalls at around 500kb.
>> 
>> The Apache error log just says:
>> [error] [client ] FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)
>> [error] [client ] Handler for fastcgi-script returned invalid result code 1
>> 
>> radosgw logging:
>> 7f00bc66a700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f00934bb700' had timed out after 600
>> 7f00bc66a700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f00ab4eb700' had timed out after 600
>> 
>> The interesting thing is that the cluster health is fine an only some files are not working properly. Most of them just work fine. A restart of radosgw fixes the issue. The other ceph logs are also clean.
>> 
>> Any idea why this happens?
>> 
> 
> No, but you can turn on 'debug ms = 1' on your gateway ceph.conf, and
> that might give some better indication.
> 
> Yehuda

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com