Re: RADOS Gateway Issues

Yehuda Sadeh <yehuda@xxxxxxxxxxx> · Wed, 22 Jan 2014 09:45:28 -0800

On Wed, Jan 22, 2014 at 8:55 AM, Graeme Lambert <glambert@xxxxxxxxxxx> wrote:
> Hi Yehuda,
>
> With regards to the health status of the cluster, it isn't healthy but I
> haven't found any way of fixing the placement group errors.  Looking at the
> ceph health detail it's also showing blocked requests too?
>
> HEALTH_WARN 1 pgs down; 3 pgs incomplete; 3 pgs stuck inactive; 3 pgs stuck
> unclean; 7 requests are blocked > 32 sec; 3 osds have slow requests; pool
> cloudstack has too few pgs; pool .rgw.buckets has too few pgs
> pg 14.0 is stuck inactive since forever, current state incomplete, last
> acting [5,0]
> pg 14.2 is stuck inactive since forever, current state incomplete, last
> acting [0,5]
> pg 14.6 is stuck inactive since forever, current state down+incomplete, last
> acting [4,2]
> pg 14.0 is stuck unclean since forever, current state incomplete, last
> acting [5,0]
> pg 14.2 is stuck unclean since forever, current state incomplete, last
> acting [0,5]
> pg 14.6 is stuck unclean since forever, current state down+incomplete, last
> acting [4,2]
> pg 14.0 is incomplete, acting [5,0]
> pg 14.2 is incomplete, acting [0,5]
> pg 14.6 is down+incomplete, acting [4,2]

You should figure these first before trying to get the gateway
working. May very well be your culprit.

Yehuda

> 3 ops are blocked > 8388.61 sec
> 3 ops are blocked > 4194.3 sec
> 1 ops are blocked > 2097.15 sec
> 1 ops are blocked > 8388.61 sec on osd.0
> 1 ops are blocked > 4194.3 sec on osd.0
> 2 ops are blocked > 8388.61 sec on osd.4
> 2 ops are blocked > 4194.3 sec on osd.5
> 1 ops are blocked > 2097.15 sec on osd.5
> 3 osds have slow requests
> pool cloudstack objects per pg (37316) is more than 27.1587 times cluster
> average (1374)
> pool .rgw.buckets objects per pg (76219) is more than 55.4723 times cluster
> average (1374)
>
>
> Ignore the cloudstack pool, I was using cloudstack but not anymore, it's an
> inactive pool.
>
> Best regards
>
> Graeme
>
>
>
> On 22/01/14 16:38, Graeme Lambert wrote:
>
> Hi,
>
> Following discussions with people in the IRC I set debug_ms and this is what
> is being looped over and over when one of them is stuck:
> http://pastebin.com/KVcpAeYT
>
> Regarding the modules, apache version is 2.2.22-2precise.ceph and the
> fastcgi mod version is 2.4.7~0910052141-2~bpo70+1.ceph.
>
> Best regards
>
> Graeme
>
>
> On 22/01/14 16:28, Yehuda Sadeh wrote:
>
> On Wed, Jan 22, 2014 at 8:05 AM, Graeme Lambert <glambert@xxxxxxxxxxx>
> wrote:
>
> Hi,
>
> I'm using the aws-sdk-for-php classes for the Ceph RADOS gateway but I'm
> getting an intermittent issue with the uploading files.
>
> I'm attempting to upload an array of objects to Ceph one by one using the
> create_object() function.  It appears to stop randomly when attempting to do
> them all, it could stop at the first one, in between or the last one, there
> is no pattern to it that I can see.
>
> I'm not getting any PHP errors that indicate an issue and equally there are
> no exceptions being caught.
>
> In the radosgw log file, at the time it appears stuck I get:
>
> 2014-01-22 15:39:21.656763 7fac44fe1700  1 ====== starting new request
> req=0x2417c30 =====
>
> And then sometimes I see:
>
> 2014-01-22 15:40:42.490485 7fac99ff9700  1 heartbeat_map is_healthy
> 'RGWProcess::m_tp thread 0x7fac51ffb700' had timed out after 600
>
> repeated over and over again.
>
> When those messages are appearing, Apache's error log shows:
>
> [Wed Jan 22 15:43:11 2014] [error] [client 172.16.2.149] FastCGI: comm with
> server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec), referer:
> https://[server]/[path]
>
> equally over and over again.
>
> I have restarted apache, radosgw, all Ceph OSDs and ceph-mon processes and
> still no joy with this.
>
> Can anyone advise on where I'm going wrong with this?
>
> Which fastcgi module are you using? Can you provide a log with 'debug
> ms = 1' for a failing request? Usually that kind of message means that
> it's waiting for the osd to response, which might point at an
> unhealthy cluster.
>
> Yehuda
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com