Another strange thing is that the last few (24) pg seems never get ready and stuck at creating (after 6 hours of waiting):
[root@serverA ~]# ceph -s
2015-03-30 17:14:48.720396 7feb5bd7a700 0 -- :/1000964 >> 10.???.78:6789/0 pipe(0x7feb60026120 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7feb600263b0).fault
cluster c09277a4-0eb9-41b1-b27f-a345c0169715
health HEALTH_WARN 24 pgs peering; 24 pgs stuck inactive; 24 pgs stuck unclean
monmap e1: 2 mons at {mac0090fa6aaf7a=10.240.212.78:6789/0,mac0090fa6ab68a=10.???.80:6789/0}, election epoch 10, quorum 0,1 mac0090fa6aaf7a,mac0090fa6ab68a
osdmap e102839: 22 osds: 22 up, 22 in
pgmap v210270: 512 pgs, 1 pools, 0 bytes data, 0 objects
51633 MB used, 63424 GB / 63475 GB avail
24 creating+peering
488 active+clean
And I cannot retrieve the file at ServerA, which I put into Ceph cluster at ServerB:
[root@serverA ~]# rados -p test32 get test.txt test.txt
2015-03-30 17:15:44.014158 7f06951b6700 0 -- 10.???.80:0/1002224 >> 10.???.78:6867/29047 pipe(0x21e0f90 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x21e1220).fault
2015-03-30 17:16:36.066125 7f0694fb4700 0 -- 10.???.80:0/1002224 >> 10.????.78:6867/29047 pipe(0x7f068000d880 sd=6 :0 s=1 pgs=0 cs=0 l=1 c=0x7f068000db10).fault
It looks it just hang there forever. Is it waiting for all pg to be ready? Or the ceph cluster is at error state?
From: Yueliang [yueliang9527@xxxxxxxxx]
Sent: Monday, March 30, 2015 1:50 PM
To: ceph-users@xxxxxxxxxxxxxx; Kai KH Huang
Subject: RE: Ceph osd is all up and in, but every pg is incomplete
I think there no other way. :)
--
Yueliang
Sent with Airmail
On March 30, 2015 at 13:17:55, Kai KH Huang (huangkai2@xxxxxxxxxx) wrote:
Thanks for the quick response, and it seems to work! But what I expect to have is (replica number = 3) on two servers ( 1 host will store 2 copies, and the other store the 3rd one
-- do deal with disk failure, rather only server failure). Is there a simple way to configure that, rather than building a custom CRUSH map?
From: Yueliang [yueliang9527@xxxxxxxxx]
Sent: Monday, March 30, 2015 12:04 PM
To: ceph-users@xxxxxxxxxxxxxx; Kai KH Huang
Subject: Re: Ceph osd is all up and in, but every pg is incomplete
Hi Kai KH
ceph -s report "493 pgs undersized”, I guess you create the pool with default parameter size=3, but you only have two host, so there it not enough host two service
the pool. you should add host or set size=2 when create pool or modify crush rule.
--
Yueliang
Sent with Airmail
On March 30, 2015 at 11:16:38, Kai KH Huang (huangkai2@xxxxxxxxxx) wrote:
Hi, all
I'm a newbie to Ceph, and just setup a whole new Ceph cluster (0.87) with two servers. But when its status is always warning:
[root@serverA ~]# ceph osd tree
# id weight type name up/down reweight
-1 62.04 root default
-2 36.4 host serverA
0 3.64 osd.0 up 1
2 3.64 osd.2 up 1
1 3.64 osd.1 up 1
3 3.64 osd.3 up 1
4 3.64 osd.4 up 1
5 3.64 osd.5 up 1
6 3.64 osd.6 up 1
7 3.64 osd.7 up 1
8 3.64 osd.8 up 1
9 3.64 osd.9 up 1
-3 25.64 host serverB
10 3.64 osd.10 up 1
11 2 osd.11 up 1
12 2 osd.12 up 1
13 2 osd.13 up 1
14 2 osd.14 up 1
15 2 osd.15 up 1
16 2 osd.16 up 1
17 2 osd.17 up 1
18 2 osd.18 up 1
19 2 osd.19 up 1
20 2 osd.20 up 1
21 2 osd.21 up 1
[root@serverA ~]# ceph -s
cluster ???????????????169715
health HEALTH_WARN 493 pgs degraded; 19 pgs peering; 19 pgs stuck inactive; 512 pgs stuck unclean; 493 pgs undersized
monmap e1: 2 mons at {serverB=10.??????.78:6789/0,serverA=10.?????.80:6789/0}, election epoch 10, quorum 0,1 mac0090fa6aaf7a,mac0090fa6ab68a
osdmap e92634: 22 osds: 22 up, 22 in
pgmap v189018: 512 pgs, 1 pools, 0 bytes data, 0 objects
49099 MB used, 63427 GB / 63475 GB avail
493 active+undersized+degraded
19 creating+peering
[root@serverA ~]# rados -p test31 ls
2015-03-30 09:57:18.607143 7f5251fcf700 0 -- :/1005913 >> 10.??????.78:6789/0 pipe(0x140a370 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x140a600).fault
2015-03-30 09:57:21.610994 7f52484ad700 0 -- 10.????.80:0/1005913 >> 10.????.78:6835/27111 pipe(0x140e010 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x140e2a0).fault
2015-03-30 10:02:21.650191 7f52482ab700 0 -- 10.????.80:0/1005913 >> 10.????78:6835/27111 pipe(0x7f5238016c80 sd=5 :0 s=1 pgs=0 cs=0 l=1 c=0x7f5238016f10).fault
* serverA is 10.???.80, serverB is 10.????.78
* ntpdate is updated
* I tried to remove the pool and re-create it, and clean up all objects inside, but no change at all
* firewall are both shutoff
Any clue is welcomed, thanks.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com