Re: - cluster stuck and undersized if at least one osd is down

Piotr Dzionek <piotr.dzionek@xxxxxxxx> · Tue, 29 Nov 2016 14:37:37 +0100

Hi,

As far as I understand if I set pool size 2, there is a chance to loose 
data when another osd dies while there is rebuild ongoing. However, it 
has to occur on the different host, because my crushmap forbids to store 
replicas on the same physical node. I am not sure what would change if I 
set min_size 2, because the only thing I would get is that there is no 
IOs to objects with less than 2 replicas, while there is rebuild 
ongoing. And in that case my vms wouldn't be able to read data from ceph 
pool. But maybe I got it wrong.

W dniu 29.11.2016 o 03:08, Brad Hubbard pisze:

On Mon, Nov 28, 2016 at 9:54 PM, Piotr Dzionek <piotr.dzionek@xxxxxxxx> wrote:
Hi,
I recently installed 3 nodes ceph cluster v.10.2.3. It has 3 mons, and 12
osds. I removed default pool and created the following one:

pool 7 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 1024 pgp_num 1024 last_change 126 flags hashpspool
stripe_width 0
Do you understand the significance of min_size 1?

Are you OK with the likelihood of data loss that this value introduces?

Cluster is healthy if all osds are up, however if I stop any of the osds, it
becomes stuck and undersized - it is not rebuilding.

     cluster *****
      health HEALTH_WARN
             166 pgs degraded
             108 pgs stuck unclean
             166 pgs undersized
             recovery 67261/827220 objects degraded (8.131%)
             1/12 in osds are down
      monmap e3: 3 mons at
{**osd01=***.144:6789/0,***osd02=***.145:6789/0,**osd03=*****.146:6789/0}
             election epoch 14, quorum 0,1,2 **osd01,**osd02,**osd03
      osdmap e161: 12 osds: 11 up, 12 in; 166 remapped pgs
             flags sortbitwise
       pgmap v307710: 1024 pgs, 1 pools, 1230 GB data, 403 kobjects
             2452 GB used, 42231 GB / 44684 GB avail
             67261/827220 objects degraded (8.131%)
                  858 active+clean
                  166 active+undersized+degraded

Replica size is 2 and and I use the following crushmap:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host osd01 {
         id -2           # do not change unnecessarily
         # weight 14.546
         alg straw
         hash 0  # rjenkins1
         item osd.0 weight 3.636
         item osd.1 weight 3.636
         item osd.2 weight 3.636
         item osd.3 weight 3.636
}
host osd02 {
         id -3           # do not change unnecessarily
         # weight 14.546
         alg straw
         hash 0  # rjenkins1
         item osd.4 weight 3.636
         item osd.5 weight 3.636
         item osd.6 weight 3.636
         item osd.7 weight 3.636
}
host osd03 {
         id -4           # do not change unnecessarily
         # weight 14.546
         alg straw
         hash 0  # rjenkins1
         item osd.8 weight 3.636
         item osd.9 weight 3.636
         item osd.10 weight 3.636
         item osd.11 weight 3.636
}
root default {
         id -1           # do not change unnecessarily
         # weight 43.637
         alg straw
         hash 0  # rjenkins1
         item osd01 weight 14.546
         item osd02 weight 14.546
         item osd03 weight 14.546
}

# rules
rule replicated_ruleset {
         ruleset 0
         type replicated
         min_size 1
         max_size 10
         step take default
         step chooseleaf firstn 0 type host
         step emit
}

# end crush map

I am not sure what is the reason for undersized state. All osd disks are the
same size and replica size is 2. Also data is only replicated per hosts
basis and I have 3 separate hosts. Maybe number of pg is incorrect ?  Is
1024 too big ? or maybe there is some misconfiguration in crushmap ?

Kind regards,
Piotr Dzionek

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Piotr Dzionek
System Administrator

SEQR Poland Sp. z o.o.
ul. Łąkowa 29, 90-554 Łódź, Poland
Mobile: +48 796555587
Mail: piotr.dzionek@xxxxxxxx
www.seqr.com | www.seamless.se

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com