Re: Ceph osd is all up and in, but every pg is incomplete

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Another strange thing is that the last few (24) pg seems never get ready and stuck at creating (after 6 hours of waiting):

[root@serverA ~]# ceph -s
2015-03-30 17:14:48.720396 7feb5bd7a700  0 -- :/1000964 >> 10.???.78:6789/0 pipe(0x7feb60026120 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7feb600263b0).fault
    cluster c09277a4-0eb9-41b1-b27f-a345c0169715
     health HEALTH_WARN 24 pgs peering; 24 pgs stuck inactive; 24 pgs stuck unclean
     monmap e1: 2 mons at {mac0090fa6aaf7a=10.240.212.78:6789/0,mac0090fa6ab68a=10.???.80:6789/0}, election epoch 10, quorum 0,1 mac0090fa6aaf7a,mac0090fa6ab68a
     osdmap e102839: 22 osds: 22 up, 22 in
      pgmap v210270: 512 pgs, 1 pools, 0 bytes data, 0 objects
            51633 MB used, 63424 GB / 63475 GB avail
                  24 creating+peering
                 488 active+clean

And I cannot retrieve the file at ServerA, which I put into Ceph cluster at ServerB:

[root@serverA ~]# rados -p test32 get test.txt test.txt
2015-03-30 17:15:44.014158 7f06951b6700  0 -- 10.???.80:0/1002224 >> 10.???.78:6867/29047 pipe(0x21e0f90 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x21e1220).fault

2015-03-30 17:16:36.066125 7f0694fb4700  0 -- 10.???.80:0/1002224 >> 10.????.78:6867/29047 pipe(0x7f068000d880 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7f068000db10).fault

It looks it just hang there forever. Is it waiting for all pg to be ready? Or the ceph cluster is at error state?

From: Yueliang [yueliang9527@xxxxxxxxx]
Sent: Monday, March 30, 2015 1:50 PM
To: ceph-users@xxxxxxxxxxxxxx; Kai KH Huang
Subject: RE: Ceph osd is all up and in, but every pg is incomplete

I think there no other way. :)

-- 
Yueliang
Sent with Airmail

On March 30, 2015 at 13:17:55, Kai KH Huang (huangkai2@xxxxxxxxxx) wrote:

Thanks for the quick response, and it seems to work! But what I expect to have is (replica number = 3) on two servers ( 1 host will store 2 copies, and the other store the 3rd one -- do deal with disk failure, rather only server failure).  Is there a simple way to configure that, rather than building a custom CRUSH map?



From: Yueliang [yueliang9527@xxxxxxxxx]
Sent: Monday, March 30, 2015 12:04 PM
To: ceph-users@xxxxxxxxxxxxxx; Kai KH Huang
Subject: Re: Ceph osd is all up and in, but every pg is incomplete

Hi  Kai KH

ceph -s report "493 pgs undersized”, I guess you create the pool with default parameter size=3, but you only have two host, so there it not enough host two service the pool. you should add host or set size=2 when create pool or modify crush rule.

-- 
Yueliang
Sent with Airmail

On March 30, 2015 at 11:16:38, Kai KH Huang (huangkai2@xxxxxxxxxx) wrote:

Hi, all
    I'm a newbie to Ceph, and just setup a whole new Ceph cluster (0.87) with two servers. But when its status is always warning:

[root@serverA ~]# ceph osd tree
# id    weight  type name       up/down reweight
-1      62.04   root default
-2      36.4            host serverA
0       3.64                    osd.0   up      1
2       3.64                    osd.2   up      1
1       3.64                    osd.1   up      1
3       3.64                    osd.3   up      1
4       3.64                    osd.4   up      1
5       3.64                    osd.5   up      1
6       3.64                    osd.6   up      1
7       3.64                    osd.7   up      1
8       3.64                    osd.8   up      1
9       3.64                    osd.9   up      1
-3      25.64           host serverB
10      3.64                    osd.10  up      1
11      2                       osd.11  up      1
12      2                       osd.12  up      1
13      2                       osd.13  up      1
14      2                       osd.14  up      1
15      2                       osd.15  up      1
16      2                       osd.16  up      1
17      2                       osd.17  up      1
18      2                       osd.18  up      1
19      2                       osd.19  up      1
20      2                       osd.20  up      1
21      2                       osd.21  up      1


[root@serverA ~]# ceph -s
    cluster ???????????????169715
     health HEALTH_WARN 493 pgs degraded; 19 pgs peering; 19 pgs stuck inactive; 512 pgs stuck unclean; 493 pgs undersized
     monmap e1: 2 mons at {serverB=10.??????.78:6789/0,serverA=10.?????.80:6789/0}, election epoch 10, quorum 0,1 mac0090fa6aaf7a,mac0090fa6ab68a
     osdmap e92634: 22 osds: 22 up, 22 in
      pgmap v189018: 512 pgs, 1 pools, 0 bytes data, 0 objects
            49099 MB used, 63427 GB / 63475 GB avail
                 493 active+undersized+degraded
                  19 creating+peering

[root@serverA ~]# rados -p test31 ls
2015-03-30 09:57:18.607143 7f5251fcf700  0 -- :/1005913 >> 10.??????.78:6789/0 pipe(0x140a370 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x140a600).fault
2015-03-30 09:57:21.610994 7f52484ad700  0 -- 10.????.80:0/1005913 >> 10.????.78:6835/27111 pipe(0x140e010 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x140e2a0).fault
2015-03-30 10:02:21.650191 7f52482ab700  0 -- 10.????.80:0/1005913 >> 10.????78:6835/27111 pipe(0x7f5238016c80 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f5238016f10).fault

* serverA is 10.???.80, serverB is 10.????.78
* ntpdate is updated
* I tried to remove the pool and re-create it, and clean up all objects inside, but no change at all
* firewall are both shutoff

Any clue is welcomed, thanks.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux