Hi,
did you find an explanation for this?
I saw something similar on a customer's cluster. They reprovisioned
OSDs (I don't know if any OSD-ID was reused) on one host with smaller
disk sizes (size was changed through the raid controller to match the
other hosts in that cluster) and they got their old crush weights
(reflecting the old disk sizes). In Luminous I remember that changed
reweights (not sure about crush weights) were stored somewhere in
/var/run/ceph/ but that doesn't seem to be the case anymore and it
also would be only relevant until a reboot. I'd also be interested
where this information is stored in newer releases and why it's stored
in the first place.
Regards,
Eugen
Zitat von Robert Sander <r.sander@xxxxxxxxxxxxxxxxxxx>:
Hi,
I stumbled across an issue where an OSD the gets redeployed has a
CRUSH weight of 0 after cephadm finishes.
I have created a service definition for the orchestrator to
automatically deploy OSDs on SSDs:
service_type: osd
service_id: SSD_OSDs
placement:
label: 'osd'
data_devices:
rotational: 0
size: '100G'
These are my steps to reproduce this in a small test cluster running 15.2.4:
root@ceph01:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.63994 root default
-18 0.81995 rack rack10
-3 0.40996 host ceph01
8 hdd 0.10699 osd.8 up 1.00000 1.00000
9 hdd 0.10699 osd.9 up 1.00000 1.00000
0 ssd 0.09799 osd.0 up 1.00000 1.00000
1 ssd 0.09798 osd.1 up 1.00000 1.00000
-5 0.40999 host ceph02
10 hdd 0.10699 osd.10 up 1.00000 1.00000
11 hdd 0.10699 osd.11 up 1.00000 1.00000
2 ssd 0.09799 osd.2 up 1.00000 1.00000
3 ssd 0.09799 osd.3 up 1.00000 1.00000
-17 0.81999 rack rack11
-7 0.40999 host ceph03
12 hdd 0.10699 osd.12 up 1.00000 1.00000
13 hdd 0.10699 osd.13 up 1.00000 1.00000
4 ssd 0.09799 osd.4 up 1.00000 1.00000
5 ssd 0.09799 osd.5 up 1.00000 1.00000
-9 0.40999 host ceph04
14 hdd 0.10699 osd.14 up 1.00000 1.00000
15 hdd 0.10699 osd.15 up 1.00000 1.00000
6 ssd 0.09799 osd.6 up 1.00000 1.00000
7 ssd 0.09799 osd.7 up 1.00000 1.00000
root@ceph01:~# ceph osd out 1
marked out osd.1.
root@ceph01:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.63994 root default
-18 0.81995 rack rack10
-3 0.40996 host ceph01
8 hdd 0.10699 osd.8 up 1.00000 1.00000
9 hdd 0.10699 osd.9 up 1.00000 1.00000
0 ssd 0.09799 osd.0 up 1.00000 1.00000
1 ssd 0.09798 osd.1 up 0 1.00000
-5 0.40999 host ceph02
10 hdd 0.10699 osd.10 up 1.00000 1.00000
11 hdd 0.10699 osd.11 up 1.00000 1.00000
2 ssd 0.09799 osd.2 up 1.00000 1.00000
3 ssd 0.09799 osd.3 up 1.00000 1.00000
-17 0.81999 rack rack11
-7 0.40999 host ceph03
12 hdd 0.10699 osd.12 up 1.00000 1.00000
13 hdd 0.10699 osd.13 up 1.00000 1.00000
4 ssd 0.09799 osd.4 up 1.00000 1.00000
5 ssd 0.09799 osd.5 up 1.00000 1.00000
-9 0.40999 host ceph04
14 hdd 0.10699 osd.14 up 1.00000 1.00000
15 hdd 0.10699 osd.15 up 1.00000 1.00000
6 ssd 0.09799 osd.6 up 1.00000 1.00000
7 ssd 0.09799 osd.7 up 1.00000 1.00000
root@ceph01:~# ceph orch osd rm 1
Scheduled OSD(s) for removal
2020-09-10T16:29:58.176991+0200 mgr.ceph02.ouelws [INF] Removing
daemon osd.1 from ceph01
2020-09-10T16:30:00.148659+0200 mgr.ceph02.ouelws [INF] Successfully
removed OSD <1> on ceph01
root@ceph01:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.54196 root default
-18 0.72197 rack rack10
-3 0.31198 host ceph01
8 hdd 0.10699 osd.8 up 1.00000 1.00000
9 hdd 0.10699 osd.9 up 1.00000 1.00000
0 ssd 0.09799 osd.0 up 1.00000 1.00000
-5 0.40999 host ceph02
10 hdd 0.10699 osd.10 up 1.00000 1.00000
11 hdd 0.10699 osd.11 up 1.00000 1.00000
2 ssd 0.09799 osd.2 up 1.00000 1.00000
3 ssd 0.09799 osd.3 up 1.00000 1.00000
-17 0.81999 rack rack11
-7 0.40999 host ceph03
12 hdd 0.10699 osd.12 up 1.00000 1.00000
13 hdd 0.10699 osd.13 up 1.00000 1.00000
4 ssd 0.09799 osd.4 up 1.00000 1.00000
5 ssd 0.09799 osd.5 up 1.00000 1.00000
-9 0.40999 host ceph04
14 hdd 0.10699 osd.14 up 1.00000 1.00000
15 hdd 0.10699 osd.15 up 1.00000 1.00000
6 ssd 0.09799 osd.6 up 1.00000 1.00000
7 ssd 0.09799 osd.7 up 1.00000 1.00000
root@ceph01:~# ceph orch device zap ceph01 /dev/sdc --force
INFO:cephadm:/usr/bin/docker:stderr --> Zapping: /dev/sdc
INFO:cephadm:/usr/bin/docker:stderr --> Zapping lvm member /dev/sdc.
lv_path is
/dev/ceph-0d19a151-30b6-459e-936a-488f143e11f6/osd-block-d5062900-abe7-413a-9d9a-d1cdda2948eb
INFO:cephadm:/usr/bin/docker:stderr Running command: /usr/bin/dd
if=/dev/zero
of=/dev/ceph-0d19a151-30b6-459e-936a-488f143e11f6/osd-block-d5062900-abe7-413a-9d9a-d1cdda2948eb bs=1M count=10
conv=fsync
INFO:cephadm:/usr/bin/docker:stderr stderr: 10+0 records in
INFO:cephadm:/usr/bin/docker:stderr 10+0 records out
INFO:cephadm:/usr/bin/docker:stderr stderr: 10485760 bytes (10 MB,
10 MiB) copied, 0.0583658 s, 180 MB/s
INFO:cephadm:/usr/bin/docker:stderr --> Only 1 LV left in VG, will
proceed to destroy volume group
ceph-0d19a151-30b6-459e-936a-488f143e11f6
INFO:cephadm:/usr/bin/docker:stderr Running command:
/usr/sbin/vgremove -v -f ceph-0d19a151-30b6-459e-936a-488f143e11f6
INFO:cephadm:/usr/bin/docker:stderr stderr: Removing
ceph--0d19a151--30b6--459e--936a--488f143e11f6-osd--block--d5062900--abe7--413a--9d9a--d1cdda2948eb
(253:3)
INFO:cephadm:/usr/bin/docker:stderr stderr: Archiving volume group
"ceph-0d19a151-30b6-459e-936a-488f143e11f6" metadata (seqno 5).
INFO:cephadm:/usr/bin/docker:stderr Releasing logical volume
"osd-block-d5062900-abe7-413a-9d9a-d1cdda2948eb"
INFO:cephadm:/usr/bin/docker:stderr stderr: Creating volume group
backup "/etc/lvm/backup/ceph-0d19a151-30b6-459e-936a-488f143e11f6"
(seqno 6).
INFO:cephadm:/usr/bin/docker:stderr stdout: Logical volume
"osd-block-d5062900-abe7-413a-9d9a-d1cdda2948eb" successfully removed
INFO:cephadm:/usr/bin/docker:stderr stderr: Removing physical
volume "/dev/sdc" from volume group
"ceph-0d19a151-30b6-459e-936a-488f143e11f6"
INFO:cephadm:/usr/bin/docker:stderr stdout: Volume group
"ceph-0d19a151-30b6-459e-936a-488f143e11f6" successfully removed
INFO:cephadm:/usr/bin/docker:stderr Running command: /usr/bin/dd
if=/dev/zero of=/dev/sdc bs=1M count=10 conv=fsync
INFO:cephadm:/usr/bin/docker:stderr stderr: 10+0 records in
INFO:cephadm:/usr/bin/docker:stderr 10+0 records out
INFO:cephadm:/usr/bin/docker:stderr stderr: 10485760 bytes (10 MB,
10 MiB) copied, 0.016043 s, 654 MB/s
INFO:cephadm:/usr/bin/docker:stderr --> Zapping successful for: <Raw
Device: /dev/sdc>
2020-09-10T16:31:15.951617+0200 mgr.ceph02.ouelws [INF] Zap device
ceph01:/dev/sdc
2020-09-10T16:31:24.738974+0200 mgr.ceph02.ouelws [INF] Found osd
claims for drivegroup SSD_OSDs -> {}
2020-09-10T16:31:24.740489+0200 mgr.ceph02.ouelws [INF] Applying
SSD_OSDs on host ceph01...
2020-09-10T16:31:31.549897+0200 mgr.ceph02.ouelws [INF] Deploying
daemon osd.1 on ceph01
2020-09-10T16:31:33.057061+0200 mgr.ceph02.ouelws [INF] Applying
SSD_OSDs on host ceph02...
2020-09-10T16:31:33.057373+0200 mgr.ceph02.ouelws [INF] Applying
SSD_OSDs on host ceph03...
2020-09-10T16:31:33.057519+0200 mgr.ceph02.ouelws [INF] Applying
SSD_OSDs on host ceph04...
2020-09-10T16:31:37.569914+0200 mon.ceph01 [INF] osd.1
[v2:10.24.4.128:6810/4173467371,v1:10.24.4.128:6811/4173467371] boot
2020-09-10T16:31:46.531544+0200 mon.ceph01 [INF] Cluster is now healthy
root@ceph01:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.54196 root default
-18 0.72197 rack rack10
-3 0.31198 host ceph01
8 hdd 0.10699 osd.8 up 1.00000 1.00000
9 hdd 0.10699 osd.9 up 1.00000 1.00000
0 ssd 0.09799 osd.0 up 1.00000 1.00000
1 ssd 0 osd.1 up 1.00000 1.00000
-5 0.40999 host ceph02
10 hdd 0.10699 osd.10 up 1.00000 1.00000
11 hdd 0.10699 osd.11 up 1.00000 1.00000
2 ssd 0.09799 osd.2 up 1.00000 1.00000
3 ssd 0.09799 osd.3 up 1.00000 1.00000
-17 0.81999 rack rack11
-7 0.40999 host ceph03
12 hdd 0.10699 osd.12 up 1.00000 1.00000
13 hdd 0.10699 osd.13 up 1.00000 1.00000
4 ssd 0.09799 osd.4 up 1.00000 1.00000
5 ssd 0.09799 osd.5 up 1.00000 1.00000
-9 0.40999 host ceph04
14 hdd 0.10699 osd.14 up 1.00000 1.00000
15 hdd 0.10699 osd.15 up 1.00000 1.00000
6 ssd 0.09799 osd.6 up 1.00000 1.00000
7 ssd 0.09799 osd.7 up 1.00000 1.00000
Why does osd.1 have a weight of 0 now?
When the OSDs had been initially deployed with the first ceph orch
apply command the weights have been correctly set according to their
size.
Why is there a difference between this process and an OSD
(re-)deployed later on?
Regards
--
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin
http://www.heinlein-support.de
Tel: 030 / 405051-43
Fax: 030 / 405051-19
Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx