- cluster stuck and undersized if at least one osd is down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
I recently installed 3 nodes ceph cluster v.10.2.3. It has 3 mons, and 12 osds. I removed default pool and created the following one:

pool 7 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 126 flags hashpspool stripe_width 0

Cluster is healthy if all osds are up, however if I stop any of the osds, it becomes stuck and undersized - it is not rebuilding.

    cluster *****
     health HEALTH_WARN
            166 pgs degraded
            108 pgs stuck unclean
            166 pgs undersized
            recovery 67261/827220 objects degraded (8.131%)
            1/12 in osds are down
     monmap e3: 3 mons at {**osd01=***.144:6789/0,***osd02=***.145:6789/0,**osd03=*****.146:6789/0}
            election epoch 14, quorum 0,1,2 **osd01,**osd02,**osd03
     osdmap e161: 12 osds: 11 up, 12 in; 166 remapped pgs
            flags sortbitwise
      pgmap v307710: 1024 pgs, 1 pools, 1230 GB data, 403 kobjects
            2452 GB used, 42231 GB / 44684 GB avail
            67261/827220 objects degraded (8.131%)
                 858 active+clean
                 166 active+undersized+degraded

Replica size is 2 and and I use the following crushmap:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host osd01 {
        id -2           # do not change unnecessarily
        # weight 14.546
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 3.636
        item osd.1 weight 3.636
        item osd.2 weight 3.636
        item osd.3 weight 3.636
}
host osd02 {
        id -3           # do not change unnecessarily
        # weight 14.546
        alg straw
        hash 0  # rjenkins1
        item osd.4 weight 3.636
        item osd.5 weight 3.636
        item osd.6 weight 3.636
        item osd.7 weight 3.636
}
host osd03 {
        id -4           # do not change unnecessarily
        # weight 14.546
        alg straw
        hash 0  # rjenkins1
        item osd.8 weight 3.636
        item osd.9 weight 3.636
        item osd.10 weight 3.636
        item osd.11 weight 3.636
}
root default {
        id -1           # do not change unnecessarily
        # weight 43.637
        alg straw
        hash 0  # rjenkins1
        item osd01 weight 14.546

        item osd02 weight 14.546
        item osd03 weight 14.546
}

# rules
rule replicated_ruleset {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

# end crush map

I am not sure what is the reason for undersized state. All osd disks are the same size and replica size is 2. Also data is only replicated per hosts basis and I have 3 separate hosts. Maybe number of pg is incorrect ?  Is 1024 too big ? or maybe there is some misconfiguration in crushmap ?


Kind regards,
Piotr Dzionek

  

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux