Re: radosgw daemon stalls on download of some files

Artem Silenkov <artem.silenkov@xxxxxxxxx> · Fri, 29 Nov 2013 22:46:02 +0700

Good day!

We ve noticed such things recently during some osd recovery things like scrubbing or so. Restarting OSD did the trick. We had even 404 errors until deep scrubbing ended. 

Any noise in ceph -w?

Regards, Artem S.

29 нояб. 2013 г. 22:28 пользователь "Sebastian" <webmaster@xxxxxxxx> написал:

>

> Hi,

>

> thanks for the hint. I tried this again and noticed that the time out message does seem to be unrelated. Here is the log file for a stalling request with debug turned on:

> http://pastebin.com/DcQuc9wP

>

> I really cannot really find a real "error" in the log. The download stalls at about 500kb at that point though. Restarting radosgw fixes it for 1 download only, the next one is broken again. But as i said this does not happen for all files.

>

> Sebastian

>

> On 27.11.2013, at 21:53, Yehuda Sadeh wrote:

>

> > On Wed, Nov 27, 2013 at 4:46 AM, Sebastian <webmaster@xxxxxxxx> wrote:

> >> Hi,

> >>

> >> we have a setup of 4 Servers running ceph and radosgw. We use it as an internal S3 service for our files. The Servers run Debian Squeeze with Ceph 0.67.4.

> >>

> >> The cluster has been running smoothly for quite a while, but we are currently experiencing issues with the radosgw. For some files the HTTP Download just stalls at around 500kb.

> >>

> >> The Apache error log just says:

> >> [error] [client ] FastCGI: comm with server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec)

> >> [error] [client ] Handler for fastcgi-script returned invalid result code 1

> >>

> >> radosgw logging:

> >> 7f00bc66a700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f00934bb700' had timed out after 600

> >> 7f00bc66a700  1 heartbeat_map is_healthy 'RGWProcess::m_tp thread 0x7f00ab4eb700' had timed out after 600

> >>

> >> The interesting thing is that the cluster health is fine an only some files are not working properly. Most of them just work fine. A restart of radosgw fixes the issue. The other ceph logs are also clean.

> >>

> >> Any idea why this happens?

> >>

> >

> > No, but you can turn on 'debug ms = 1' on your gateway ceph.conf, and

> > that might give some better indication.

> >

> > Yehuda

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com