Re: CRUSH straw2 can not handle big weight differences

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 01/29/2018 01:14 PM, Niklas wrote:
Ceph luminous 12.2.2
$: ceph osd pool create hybrid 1024 1024 replicated hybrid
$: ceph -s
   cluster:
     id:     e07f568d-056c-4e01-9292-732c64ab4f8e
     health: HEALTH_WARN
            Degraded data redundancy: 431 pgs unclean, 431 pgs degraded, 431 pgs undersized

   services:
     mon: 3 daemons, quorum s11,s12,s13
     mgr: s11(active), standbys: s12, s13
     osd: 54 osds: 54 up, 54 in

   data:
     pools:   1 pools, 1024 pgs
     objects: 0 objects, 0 bytes
     usage:   61749 MB used, 707 GB / 767 GB avail
     pgs:     593 active+clean
              431 active+undersized+degraded


This can be solved if I set same weight 1.000 (or 50.000, or 100.000 or ..) manually on all items (hosts) in datacenter vDC, vDC1, vDC2 and vDC3 (see below). Problem is when adding new OSDs, the crush map is getting new weights in these datacenters resulting in a broken cluster until fixed manually.

*My question is, in a datacenter where it exist only 3 hosts, why is ceph not mapping pgs to these 3 hosts? Clearly it has something to do with the big weight difference between hosts but why?*

------------------------
Below is a simplified ceph setup of a Hybrid solution with NVMe and HDD drivers. 1 copy on NVMe and 2 copies on HDD. Advantage is great read performance and cost savings. Disadvantages is low write performance. Still the write performance is good thanks to rockdb on Intel Optane disks in HDD servers.

I have six servers in this virtualized lab ceph cluster.
Only NVMe drives on s11, s12 and s13.
Only HDD drives on s21, s22 and s23.


# buckets
host s11 {
     # weight 0.016
     alg straw2
     hash 0
     item osd.0 weight 0.002
     item osd.1 weight 0.002
     item osd.2 weight 0.006
     item osd.3 weight 0.002
     item osd.4 weight 0.002
     item osd.18 weight 0.002
}

host s12 {
     # weight 0.016
     alg straw2
     hash 0
     item osd.5 weight 0.006
     item osd.6 weight 0.002
     item osd.7 weight 0.002
     item osd.8 weight 0.002
     item osd.9 weight 0.002
     item osd.53 weight 0.002
}

host s13 {
     # weight 0.016
     alg straw2
     hash 0
     item osd.10 weight 0.006
     item osd.11 weight 0.002
     item osd.12 weight 0.002
     item osd.13 weight 0.002
     item osd.14 weight 0.002
     item osd.54 weight 0.002
}

host s21 {
     # weight 0.228
     alg straw2
     hash 0
     item osd.15 weight 0.019
     item osd.16 weight 0.019
     item osd.17 weight 0.019
     item osd.19 weight 0.019
     item osd.20 weight 0.019
     item osd.21 weight 0.019
     item osd.22 weight 0.019
     item osd.23 weight 0.019
     item osd.24 weight 0.019
     item osd.25 weight 0.019
     item osd.26 weight 0.019
     item osd.51 weight 0.019
}

host s22 {
     # weight 0.228
     alg straw2
     hash 0
     item osd.27 weight 0.019
     item osd.28 weight 0.019
     item osd.29 weight 0.019
     item osd.30 weight 0.019
     item osd.31 weight 0.019
     item osd.32 weight 0.019
     item osd.33 weight 0.019
     item osd.34 weight 0.019
     item osd.35 weight 0.019
     item osd.36 weight 0.019
     item osd.37 weight 0.019
     item osd.38 weight 0.019
}
host s23 {
     # weight 0.228
     alg straw2
     hash 0
     item osd.39 weight 0.019
     item osd.40 weight 0.019
     item osd.41 weight 0.019
     item osd.42 weight 0.019
     item osd.43 weight 0.019
     item osd.44 weight 0.019
     item osd.45 weight 0.019
     item osd.46 weight 0.019
     item osd.47 weight 0.019
     item osd.48 weight 0.019
     item osd.49 weight 0.019
     item osd.50 weight 0.019
}
datacenter vDC1 {
     # weight 0.472
     alg straw2
     hash 0
     item s11 weight 0.016
     item s22 weight 10.000
     item s23 weight 10.000
}

datacenter vDC2 {
     # weight 0.472
     alg straw2
     hash 0
     item s12 weight 0.016
     item s21 weight 0.228
     item s23 weight 0.228
}

datacenter vDC3 {
     # weight 0.472
     alg straw2
     hash 0
     item s13 weight 0.016
     item s21 weight 0.228
     item s22 weight 0.228
}
datacenter vDC {
     # weight 1,416
     alg straw2
     hash 0
     item vDC1 weight 0.472
     item vDC2 weight 0.472
     item vDC3 weight 0.472
}

# rules
rule hybrid {
     id 1
     type replicated
     min_size 1
     max_size 10
     step take vDC
     step choose firstn 1 type datacenter
     step chooseleaf firstn 0 type host
     step emit
}


Is it your intention to put all copies of a object in only one DC?

What is your exact idea behind this rule? What's the purpose?

Wido

Again, setting same weight like 1.000 OR 100.000 on all items in datacenters vDC, vDC1, vDC2 and vDC3 makes the cluster work.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux