Re: RADOS Gateway Issues

Graeme Lambert <glambert@xxxxxxxxxxx> · Wed, 22 Jan 2014 16:55:28 +0000

    Hi Yehuda,

      With regards to the health status of the cluster, it isn't healthy
      but I haven't found any way of fixing the placement group errors. 
      Looking at the ceph health detail it's also showing blocked
      requests too?

      HEALTH_WARN 1 pgs down; 3 pgs incomplete; 3 pgs stuck inactive; 3
      pgs stuck unclean; 7 requests are blocked > 32 sec; 3 osds have
      slow requests; pool cloudstack has too few pgs; pool .rgw.buckets
      has too few pgs

      pg 14.0 is stuck inactive since forever, current state incomplete,
      last acting [5,0]

      pg 14.2 is stuck inactive since forever, current state incomplete,
      last acting [0,5]

      pg 14.6 is stuck inactive since forever, current state
      down+incomplete, last acting [4,2]

      pg 14.0 is stuck unclean since forever, current state incomplete,
      last acting [5,0]

      pg 14.2 is stuck unclean since forever, current state incomplete,
      last acting [0,5]

      pg 14.6 is stuck unclean since forever, current state
      down+incomplete, last acting [4,2]

      pg 14.0 is incomplete, acting [5,0]

      pg 14.2 is incomplete, acting [0,5]

      pg 14.6 is down+incomplete, acting [4,2]

      3 ops are blocked > 8388.61 sec

      3 ops are blocked > 4194.3 sec

      1 ops are blocked > 2097.15 sec

      1 ops are blocked > 8388.61 sec on osd.0

      1 ops are blocked > 4194.3 sec on osd.0

      2 ops are blocked > 8388.61 sec on osd.4

      2 ops are blocked > 4194.3 sec on osd.5

      1 ops are blocked > 2097.15 sec on osd.5

      3 osds have slow requests

      pool cloudstack objects per pg (37316) is more than 27.1587 times
      cluster average (1374)

      pool .rgw.buckets objects per pg (76219) is more than 55.4723
      times cluster average (1374)

      Ignore the cloudstack pool, I was using cloudstack but not
      anymore, it's an inactive pool.

        Best regards
        Graeme

      On 22/01/14 16:38, Graeme Lambert wrote:

      Hi,

        Following discussions with people in the IRC I set debug_ms and
        this is what is being looped over and over when one of them is
        stuck:

        http://pastebin.com/KVcpAeYT

        Regarding the modules, apache version is 2.2.22-2precise.ceph
        and the fastcgi mod version is 2.4.7~0910052141-2~bpo70+1.ceph.

          Best regards
          Graeme  

        On 22/01/14 16:28, Yehuda Sadeh wrote:

        On Wed, Jan 22, 2014 at 8:05 AM, Graeme Lambert <glambert@xxxxxxxxxxx> wrote:

          Hi,

I'm using the aws-sdk-for-php classes for the Ceph RADOS gateway but I'm
getting an intermittent issue with the uploading files.

I'm attempting to upload an array of objects to Ceph one by one using the
create_object() function.  It appears to stop randomly when attempting to do
them all, it could stop at the first one, in between or the last one, there
is no pattern to it that I can see.

I'm not getting any PHP errors that indicate an issue and equally there are
no exceptions being caught.

In the radosgw log file, at the time it appears stuck I get:

2014-01-22 15:39:21.656763 7fac44fe1700  1 ====== starting new request
req=0x2417c30 =====

And then sometimes I see:

2014-01-22 15:40:42.490485 7fac99ff9700  1 heartbeat_map is_healthy
'RGWProcess::m_tp thread 0x7fac51ffb700' had timed out after 600

repeated over and over again.

When those messages are appearing, Apache's error log shows:

[Wed Jan 22 15:43:11 2014] [error] [client 172.16.2.149] FastCGI: comm with
server "/var/www/s3gw.fcgi" aborted: idle timeout (30 sec), referer:
https://[server]/[path]

equally over and over again.

I have restarted apache, radosgw, all Ceph OSDs and ceph-mon processes and
still no joy with this.

Can anyone advise on where I'm going wrong with this?

        Which fastcgi module are you using? Can you provide a log with 'debug
ms = 1' for a failing request? Usually that kind of message means that
it's waiting for the osd to response, which might point at an
unhealthy cluster.

Yehuda

      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com