ceph osd crush reweight rounding issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

After some tests with OSD crush weight I wanted to restore the original weight of the OSD. But that proves to be difficult. See the following example:

Situation before OSD crush reweight has taken place

ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME            STATUS  REWEIGHT  PRI-AFF
-11         0.39197  datacenter dc1
  -5         0.39197      host node1
   2    hdd  0.09798          osd.2            up   1.00000  1.00000
   3    hdd  0.09798          osd.3            up   1.00000  1.00000
   6    ssd  0.09798          osd.6            up   1.00000  1.00000
   7    ssd  0.09798          osd.7            up   1.00000  1.00000
-10         0.39197  datacenter dc2
  -3         0.39197      host node2
   0    hdd  0.09798          osd.0            up   1.00000  1.00000
   1    hdd  0.09798          osd.1            up   1.00000  1.00000
   4    ssd  0.09798          osd.4            up   1.00000  1.00000
   5    ssd  0.09798          osd.5            up   1.00000  1.00000
  -1               0  root default


ceph osd crush reweight osd.5 0.1

ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME            STATUS  REWEIGHT  PRI-AFF
-11         0.39197  datacenter dc1
 -5         0.39197      host node1
  2    hdd  0.09798          osd.2            up   1.00000  1.00000
  3    hdd  0.09798          osd.3            up   1.00000  1.00000
  6    ssd  0.09798          osd.6            up   1.00000  1.00000
  7    ssd  0.09798          osd.7            up   1.00000  1.00000
-10         0.39392  datacenter dc2
 -3         0.39392      host node2
  0    hdd  0.09798          osd.0            up   1.00000  1.00000
  1    hdd  0.09798          osd.1            up   1.00000  1.00000
  4    ssd  0.09798          osd.4            up   1.00000  1.00000
  5    ssd  0.09999          osd.5            up   1.00000  1.00000
 -1               0  root default

ceph osd crush reweight osd.5 0.09798
reweighted item id 5 name 'osd.5' to 0.09798 in crush map

ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME            STATUS  REWEIGHT  PRI-AFF
-11         0.39197  datacenter dc1
 -5         0.39197      host node1
  2    hdd  0.09798          osd.2            up   1.00000  1.00000
  3    hdd  0.09798          osd.3            up   1.00000  1.00000
  6    ssd  0.09798          osd.6            up   1.00000  1.00000
  7    ssd  0.09798          osd.7            up   1.00000  1.00000
-10         0.39191  datacenter dc2
 -3         0.39191      host dc2
  0    hdd  0.09798          osd.0            up   1.00000  1.00000
  1    hdd  0.09798          osd.1            up   1.00000  1.00000
  4    ssd  0.09798          osd.4            up   1.00000  1.00000
  5    ssd  0.09798          osd.5            up   1.00000  1.00000
 -1               0  root default

The difference is (0.39197 - 0.09798) = .00006. If we divide this by four OSDs per host bucket it is .000015. So each OSD could be reweighted to 0.097995 to give the host / data center buckets equal weights.

This indeed works, after reweighting all OSDs with 0.097995

-10         0.39197  datacenter dc2
 -3         0.39197      host node2
  0    hdd  0.09799          osd.0            up   1.00000  1.00000
  1    hdd  0.09799          osd.1            up   1.00000  1.00000
  4    ssd  0.09799          osd.4            up   1.00000  1.00000
  5    ssd  0.09799          osd.5            up   1.00000  1.00000
 -1               0  root default

Note however that the OSD weights now differ from (identical) OSDs in the other datacenter.

Why is this such a big deal? Well, Ceph "stretch mode" requires each bucket to be of equal weight. And IMHO it should be easily possible to set the weight Ceph CRUSH internally uses to do its calculations.

Anyone has more info on how to do this?

Gr. Stefan

P.s. Sure, I could reweight all OSDs to a new weight, but that's more of a workaround than a proper way of fixing it.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux