Re: Growing the cluster.

Dan Mick <dan.mick@xxxxxxxxxxx> · Mon, 20 May 2013 21:33:16 -0700

What does your crushmap look like?  There's a good chance you're 
choosing first hosts, and then OSDs, which means you can't come up with 
3 replicas (because there are only two hosts).

Try:
ceph -o my.crush.map osd getcrushmap
crushtool -i my.crush.map --test --output-csv

and then look at the .csv files created in that directory; that 
simulates some random object placements, and will let you know which 
OSDs the crushmap chose.  I bet you'll see that the data pool isn't 
replicating to 3 OSDs.

On 05/20/2013 11:51 AM, Nicolas Fernandez wrote:
Hello,
I'm deploying a test cluster on 0.61.2 version between two nodes
(OSD/MDS), and another (MON).
I have a problem making my cluster grow, today i've added an OSD into a
node that was a osd exist. I've made a reweight and add a replica.The
crushmap is up to date but now i'm getting some pgs in stuck
unclean.I've been cheecking tuneables options but that haven't sold the
issue, how can i fix the healthof the cluster?.

My cluster status:

# ceph -s
    health HEALTH_WARN 192 pgs degraded; 177 pgs stuck unclean; recovery
10910/32838 degraded (33.224%); clock skew detected on mon.b
    monmap e1: 3 mons at
{a=192.168.2.144:6789/0,b=192.168.2.194:6789/0,c=192.168.2.145:6789/0
<http://192.168.2.144:6789/0,b=192.168.2.194:6789/0,c=192.168.2.145:6789/0>},
election epoch 148, quorum 0,1,2 a,b,c
    osdmap e576: 3 osds: 3 up, 3 in
     pgmap v17715: 576 pgs: 79 active, 305 active+clean, 98
active+degraded, 94 active+clean+degraded; 1837 MB data, 6778 MB used,
440 GB / 446 GB avail; 10910/32838 degraded (33.224%)
    mdsmap e136: 1/1/1 up {0=a=up:active}

The replica configuration is:

pool 0 'data' rep size 3 min_size 2 crush_ruleset 0 object_hash rjenkins
pg_num 192 pgp_num 192 last_change 576 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash
rjenkins pg_num 192 pgp_num 192 last_change 556 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins
pg_num 192 pgp_num 192 last_change 1 owner 0

OSD Tree:

#ceph osd tree

# id    weight    type name    up/down    reweight
-1    3    root default
-3    3        rack unknownrack
-2    1            host ceph01
0    1                osd.0    up    1
-4    2            host ceph02
1    1                osd.1    up    1
2    1                osd.2    up    1

Thanks.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com