Re: Ceph osd is all up and in, but every pg is incomplete

Kai KH Huang <huangkai2@xxxxxxxxxx> · Mon, 30 Mar 2015 09:23:54 +0000

Another strange thing is that the last few (24) pg seems never get ready and stuck at creating (after 6 hours of waiting):

[root@serverA ~]# ceph -s

2015-03-30 17:14:48.720396 7feb5bd7a700  0 -- :/1000964 >> 10.???.78:6789/0 pipe(0x7feb60026120 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7feb600263b0).fault

    cluster c09277a4-0eb9-41b1-b27f-a345c0169715

     health HEALTH_WARN 24 pgs peering; 24 pgs stuck inactive; 24 pgs stuck unclean

     monmap e1: 2 mons at {mac0090fa6aaf7a=10.240.212.78:6789/0,mac0090fa6ab68a=10.???.80:6789/0}, election epoch 10, quorum 0,1 mac0090fa6aaf7a,mac0090fa6ab68a

     osdmap e102839: 22 osds: 22 up, 22 in

      pgmap v210270: 512 pgs, 1 pools, 0 bytes data, 0 objects

            51633 MB used, 63424 GB / 63475 GB avail

                  24 creating+peering

                 488 active+clean

And I cannot retrieve the file at ServerA, which I put into Ceph cluster at ServerB:

[root@serverA ~]# rados -p test32 get test.txt test.txt

2015-03-30 17:15:44.014158 7f06951b6700  0 -- 10.???.80:0/1002224 >> 10.???.78:6867/29047 pipe(0x21e0f90 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x21e1220).fault

2015-03-30 17:16:36.066125 7f0694fb4700  0 -- 10.???.80:0/1002224 >> 10.????.78:6867/29047 pipe(0x7f068000d880 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7f068000db10).fault

It looks it just hang there forever. Is it waiting for all pg to be ready? Or the ceph cluster is at error state?

From: Yueliang [yueliang9527@xxxxxxxxx]

Sent: Monday, March 30, 2015 1:50 PM

To: ceph-users@xxxxxxxxxxxxxx; Kai KH Huang

Subject: RE:  Ceph osd is all up and in, but every pg is incomplete

I think there no other way. :)

-- 

Yueliang

Sent with Airmail

On March 30, 2015 at 13:17:55, Kai KH Huang (huangkai2@xxxxxxxxxx) wrote:

Thanks for the quick response, and it seems to work! But what I expect to have is (replica number = 3) on two servers ( 1 host will store 2 copies, and the other store the 3rd one
 -- do deal with disk failure, rather only server failure).  Is there a simple way to configure that, rather than building a custom CRUSH map?

From: Yueliang [yueliang9527@xxxxxxxxx]

Sent: Monday, March 30, 2015 12:04 PM

To: ceph-users@xxxxxxxxxxxxxx; Kai KH Huang

Subject: Re:  Ceph osd is all up and in, but every pg is incomplete

Hi  Kai KH

ceph -s report "493 pgs undersized”, I guess you create the pool with default parameter size=3, but you only have two host, so there it not enough host two service
 the pool. you should add host or set size=2 when create pool or modify crush rule.

-- 

Yueliang

Sent with Airmail

On March 30, 2015 at 11:16:38, Kai KH Huang (huangkai2@xxxxxxxxxx) wrote:

Hi, all

    I'm a newbie to Ceph, and just setup a whole new Ceph cluster (0.87) with two servers. But when its status is always warning:

[root@serverA ~]# ceph osd tree

# id    weight  type name       up/down reweight

-1      62.04   root default

-2      36.4            host serverA

0       3.64                    osd.0   up      1

2       3.64                    osd.2   up      1

1       3.64                    osd.1   up      1

3       3.64                    osd.3   up      1

4       3.64                    osd.4   up      1

5       3.64                    osd.5   up      1

6       3.64                    osd.6   up      1

7       3.64                    osd.7   up      1

8       3.64                    osd.8   up      1

9       3.64                    osd.9   up      1

-3      25.64           host serverB

10      3.64                    osd.10  up      1

11      2                       osd.11  up      1

12      2                       osd.12  up      1

13      2                       osd.13  up      1

14      2                       osd.14  up      1

15      2                       osd.15  up      1

16      2                       osd.16  up      1

17      2                       osd.17  up      1

18      2                       osd.18  up      1

19      2                       osd.19  up      1

20      2                       osd.20  up      1

21      2                       osd.21  up      1

[root@serverA ~]# ceph -s

    cluster ???????????????169715

     health HEALTH_WARN 493 pgs degraded; 19 pgs peering; 19 pgs stuck inactive; 512 pgs stuck unclean; 493 pgs undersized

     monmap e1: 2 mons at {serverB=10.??????.78:6789/0,serverA=10.?????.80:6789/0}, election epoch 10, quorum 0,1 mac0090fa6aaf7a,mac0090fa6ab68a

     osdmap e92634: 22 osds: 22 up, 22 in

      pgmap v189018: 512 pgs, 1 pools, 0 bytes data, 0 objects

            49099 MB used, 63427 GB / 63475 GB avail

                 493 active+undersized+degraded

                  19 creating+peering

[root@serverA ~]# rados -p test31 ls

2015-03-30 09:57:18.607143 7f5251fcf700  0 -- :/1005913 >> 10.??????.78:6789/0 pipe(0x140a370 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x140a600).fault

2015-03-30 09:57:21.610994 7f52484ad700  0 -- 10.????.80:0/1005913 >> 10.????.78:6835/27111 pipe(0x140e010 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x140e2a0).fault

2015-03-30 10:02:21.650191 7f52482ab700  0 -- 10.????.80:0/1005913 >> 10.????78:6835/27111 pipe(0x7f5238016c80 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f5238016f10).fault

* serverA is 10.???.80, serverB is 10.????.78

* ntpdate is updated

* I tried to remove the pool and re-create it, and clean up all objects inside, but no change at all

* firewall are both shutoff

Any clue is welcomed, thanks.

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com