Re: Orchestrator cephadm not setting CRUSH weight on OSD

Eugen Block <eblock@xxxxxx> · Tue, 29 Sep 2020 11:29:48 +0000

Hi,

did you find an explanation for this?

I saw something similar on a customer's cluster. They reprovisioned  
OSDs (I don't know if any OSD-ID was reused) on one host with smaller  
disk sizes (size was changed through the raid controller to match the  
other hosts in that cluster) and they got their old crush weights  
(reflecting the old disk sizes). In Luminous I remember that changed  
reweights (not sure about crush weights) were stored somewhere in  
/var/run/ceph/ but that doesn't seem to be the case anymore and it  
also would be only relevant until a reboot. I'd also be interested  
where this information is stored in newer releases and why it's stored  
in the first place.

Regards,
Eugen

Zitat von Robert Sander <r.sander@xxxxxxxxxxxxxxxxxxx>:

Hi,

I stumbled across an issue where an OSD the gets redeployed has a  
CRUSH weight of 0 after cephadm finishes.

I have created a service definition for the orchestrator to  
automatically deploy OSDs on SSDs:

service_type: osd
service_id: SSD_OSDs
placement:
  label: 'osd'
data_devices:
  rotational: 0
  size: '100G'

These are my steps to reproduce this in a small test cluster running 15.2.4:

root@ceph01:~# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME            STATUS  REWEIGHT  PRI-AFF
 -1         1.63994  root default
-18         0.81995      rack rack10
 -3         0.40996          host ceph01
  8    hdd  0.10699              osd.8        up   1.00000  1.00000
  9    hdd  0.10699              osd.9        up   1.00000  1.00000
  0    ssd  0.09799              osd.0        up   1.00000  1.00000
  1    ssd  0.09798              osd.1        up   1.00000  1.00000
 -5         0.40999          host ceph02
 10    hdd  0.10699              osd.10       up   1.00000  1.00000
 11    hdd  0.10699              osd.11       up   1.00000  1.00000
  2    ssd  0.09799              osd.2        up   1.00000  1.00000
  3    ssd  0.09799              osd.3        up   1.00000  1.00000
-17         0.81999      rack rack11
 -7         0.40999          host ceph03
 12    hdd  0.10699              osd.12       up   1.00000  1.00000
 13    hdd  0.10699              osd.13       up   1.00000  1.00000
  4    ssd  0.09799              osd.4        up   1.00000  1.00000
  5    ssd  0.09799              osd.5        up   1.00000  1.00000
 -9         0.40999          host ceph04
 14    hdd  0.10699              osd.14       up   1.00000  1.00000
 15    hdd  0.10699              osd.15       up   1.00000  1.00000
  6    ssd  0.09799              osd.6        up   1.00000  1.00000
  7    ssd  0.09799              osd.7        up   1.00000  1.00000
root@ceph01:~# ceph osd out 1
marked out osd.1.
root@ceph01:~# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME            STATUS  REWEIGHT  PRI-AFF
 -1         1.63994  root default
-18         0.81995      rack rack10
 -3         0.40996          host ceph01
  8    hdd  0.10699              osd.8        up   1.00000  1.00000
  9    hdd  0.10699              osd.9        up   1.00000  1.00000
  0    ssd  0.09799              osd.0        up   1.00000  1.00000
  1    ssd  0.09798              osd.1        up         0  1.00000
 -5         0.40999          host ceph02
 10    hdd  0.10699              osd.10       up   1.00000  1.00000
 11    hdd  0.10699              osd.11       up   1.00000  1.00000
  2    ssd  0.09799              osd.2        up   1.00000  1.00000
  3    ssd  0.09799              osd.3        up   1.00000  1.00000
-17         0.81999      rack rack11
 -7         0.40999          host ceph03
 12    hdd  0.10699              osd.12       up   1.00000  1.00000
 13    hdd  0.10699              osd.13       up   1.00000  1.00000
  4    ssd  0.09799              osd.4        up   1.00000  1.00000
  5    ssd  0.09799              osd.5        up   1.00000  1.00000
 -9         0.40999          host ceph04
 14    hdd  0.10699              osd.14       up   1.00000  1.00000
 15    hdd  0.10699              osd.15       up   1.00000  1.00000
  6    ssd  0.09799              osd.6        up   1.00000  1.00000
  7    ssd  0.09799              osd.7        up   1.00000  1.00000
root@ceph01:~# ceph orch osd rm 1
Scheduled OSD(s) for removal

2020-09-10T16:29:58.176991+0200 mgr.ceph02.ouelws [INF] Removing  
daemon osd.1 from ceph01
2020-09-10T16:30:00.148659+0200 mgr.ceph02.ouelws [INF] Successfully  
removed OSD <1> on ceph01

root@ceph01:~# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME            STATUS  REWEIGHT  PRI-AFF
 -1         1.54196  root default
-18         0.72197      rack rack10
 -3         0.31198          host ceph01
  8    hdd  0.10699              osd.8        up   1.00000  1.00000
  9    hdd  0.10699              osd.9        up   1.00000  1.00000
  0    ssd  0.09799              osd.0        up   1.00000  1.00000
 -5         0.40999          host ceph02
 10    hdd  0.10699              osd.10       up   1.00000  1.00000
 11    hdd  0.10699              osd.11       up   1.00000  1.00000
  2    ssd  0.09799              osd.2        up   1.00000  1.00000
  3    ssd  0.09799              osd.3        up   1.00000  1.00000
-17         0.81999      rack rack11
 -7         0.40999          host ceph03
 12    hdd  0.10699              osd.12       up   1.00000  1.00000
 13    hdd  0.10699              osd.13       up   1.00000  1.00000
  4    ssd  0.09799              osd.4        up   1.00000  1.00000
  5    ssd  0.09799              osd.5        up   1.00000  1.00000
 -9         0.40999          host ceph04
 14    hdd  0.10699              osd.14       up   1.00000  1.00000
 15    hdd  0.10699              osd.15       up   1.00000  1.00000
  6    ssd  0.09799              osd.6        up   1.00000  1.00000
  7    ssd  0.09799              osd.7        up   1.00000  1.00000
root@ceph01:~# ceph orch device zap ceph01 /dev/sdc --force
INFO:cephadm:/usr/bin/docker:stderr --> Zapping: /dev/sdc
INFO:cephadm:/usr/bin/docker:stderr --> Zapping lvm member /dev/sdc.  
lv_path is  
/dev/ceph-0d19a151-30b6-459e-936a-488f143e11f6/osd-block-d5062900-abe7-413a-9d9a-d1cdda2948eb
INFO:cephadm:/usr/bin/docker:stderr Running command: /usr/bin/dd  
if=/dev/zero  
of=/dev/ceph-0d19a151-30b6-459e-936a-488f143e11f6/osd-block-d5062900-abe7-413a-9d9a-d1cdda2948eb bs=1M count=10  
conv=fsync
INFO:cephadm:/usr/bin/docker:stderr  stderr: 10+0 records in
INFO:cephadm:/usr/bin/docker:stderr 10+0 records out
INFO:cephadm:/usr/bin/docker:stderr  stderr: 10485760 bytes (10 MB,  
10 MiB) copied, 0.0583658 s, 180 MB/s
INFO:cephadm:/usr/bin/docker:stderr --> Only 1 LV left in VG, will  
proceed to destroy volume group  
ceph-0d19a151-30b6-459e-936a-488f143e11f6
INFO:cephadm:/usr/bin/docker:stderr Running command:  
/usr/sbin/vgremove -v -f ceph-0d19a151-30b6-459e-936a-488f143e11f6
INFO:cephadm:/usr/bin/docker:stderr  stderr: Removing  
ceph--0d19a151--30b6--459e--936a--488f143e11f6-osd--block--d5062900--abe7--413a--9d9a--d1cdda2948eb  
(253:3)
INFO:cephadm:/usr/bin/docker:stderr  stderr: Archiving volume group  
"ceph-0d19a151-30b6-459e-936a-488f143e11f6" metadata (seqno 5).
INFO:cephadm:/usr/bin/docker:stderr   Releasing logical volume  
"osd-block-d5062900-abe7-413a-9d9a-d1cdda2948eb"
INFO:cephadm:/usr/bin/docker:stderr  stderr: Creating volume group  
backup "/etc/lvm/backup/ceph-0d19a151-30b6-459e-936a-488f143e11f6"  
(seqno 6).
INFO:cephadm:/usr/bin/docker:stderr  stdout: Logical volume  
"osd-block-d5062900-abe7-413a-9d9a-d1cdda2948eb" successfully removed
INFO:cephadm:/usr/bin/docker:stderr  stderr: Removing physical  
volume "/dev/sdc" from volume group  
"ceph-0d19a151-30b6-459e-936a-488f143e11f6"
INFO:cephadm:/usr/bin/docker:stderr  stdout: Volume group  
"ceph-0d19a151-30b6-459e-936a-488f143e11f6" successfully removed
INFO:cephadm:/usr/bin/docker:stderr Running command: /usr/bin/dd  
if=/dev/zero of=/dev/sdc bs=1M count=10 conv=fsync
INFO:cephadm:/usr/bin/docker:stderr  stderr: 10+0 records in
INFO:cephadm:/usr/bin/docker:stderr 10+0 records out
INFO:cephadm:/usr/bin/docker:stderr  stderr: 10485760 bytes (10 MB,  
10 MiB) copied, 0.016043 s, 654 MB/s
INFO:cephadm:/usr/bin/docker:stderr --> Zapping successful for: <Raw  
Device: /dev/sdc>

2020-09-10T16:31:15.951617+0200 mgr.ceph02.ouelws [INF] Zap device  
ceph01:/dev/sdc
2020-09-10T16:31:24.738974+0200 mgr.ceph02.ouelws [INF] Found osd  
claims for drivegroup SSD_OSDs -> {}
2020-09-10T16:31:24.740489+0200 mgr.ceph02.ouelws [INF] Applying  
SSD_OSDs on host ceph01...
2020-09-10T16:31:31.549897+0200 mgr.ceph02.ouelws [INF] Deploying  
daemon osd.1 on ceph01
2020-09-10T16:31:33.057061+0200 mgr.ceph02.ouelws [INF] Applying  
SSD_OSDs on host ceph02...
2020-09-10T16:31:33.057373+0200 mgr.ceph02.ouelws [INF] Applying  
SSD_OSDs on host ceph03...
2020-09-10T16:31:33.057519+0200 mgr.ceph02.ouelws [INF] Applying  
SSD_OSDs on host ceph04...
2020-09-10T16:31:37.569914+0200 mon.ceph01 [INF] osd.1  
[v2:10.24.4.128:6810/4173467371,v1:10.24.4.128:6811/4173467371] boot
2020-09-10T16:31:46.531544+0200 mon.ceph01 [INF] Cluster is now healthy

root@ceph01:~# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME            STATUS  REWEIGHT  PRI-AFF
 -1         1.54196  root default
-18         0.72197      rack rack10
 -3         0.31198          host ceph01
  8    hdd  0.10699              osd.8        up   1.00000  1.00000
  9    hdd  0.10699              osd.9        up   1.00000  1.00000
  0    ssd  0.09799              osd.0        up   1.00000  1.00000
  1    ssd        0              osd.1        up   1.00000  1.00000
 -5         0.40999          host ceph02
 10    hdd  0.10699              osd.10       up   1.00000  1.00000
 11    hdd  0.10699              osd.11       up   1.00000  1.00000
  2    ssd  0.09799              osd.2        up   1.00000  1.00000
  3    ssd  0.09799              osd.3        up   1.00000  1.00000
-17         0.81999      rack rack11
 -7         0.40999          host ceph03
 12    hdd  0.10699              osd.12       up   1.00000  1.00000
 13    hdd  0.10699              osd.13       up   1.00000  1.00000
  4    ssd  0.09799              osd.4        up   1.00000  1.00000
  5    ssd  0.09799              osd.5        up   1.00000  1.00000
 -9         0.40999          host ceph04
 14    hdd  0.10699              osd.14       up   1.00000  1.00000
 15    hdd  0.10699              osd.15       up   1.00000  1.00000
  6    ssd  0.09799              osd.6        up   1.00000  1.00000
  7    ssd  0.09799              osd.7        up   1.00000  1.00000

Why does osd.1 have a weight of 0 now?

When the OSDs had been initially deployed with the first ceph orch  
apply command the weights have been correctly set according to their  
size.
Why is there a difference between this process and an OSD  
(re-)deployed later on?

Regards
--
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx