Hi, I stumbled across an issue where an OSD the gets redeployed has a CRUSH weight of 0 after cephadm finishes. I have created a service definition for the orchestrator to automatically deploy OSDs on SSDs: service_type: osd service_id: SSD_OSDs placement: label: 'osd' data_devices: rotational: 0 size: '100G' These are my steps to reproduce this in a small test cluster running 15.2.4: root@ceph01:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 1.63994 root default -18 0.81995 rack rack10 -3 0.40996 host ceph01 8 hdd 0.10699 osd.8 up 1.00000 1.00000 9 hdd 0.10699 osd.9 up 1.00000 1.00000 0 ssd 0.09799 osd.0 up 1.00000 1.00000 1 ssd 0.09798 osd.1 up 1.00000 1.00000 -5 0.40999 host ceph02 10 hdd 0.10699 osd.10 up 1.00000 1.00000 11 hdd 0.10699 osd.11 up 1.00000 1.00000 2 ssd 0.09799 osd.2 up 1.00000 1.00000 3 ssd 0.09799 osd.3 up 1.00000 1.00000 -17 0.81999 rack rack11 -7 0.40999 host ceph03 12 hdd 0.10699 osd.12 up 1.00000 1.00000 13 hdd 0.10699 osd.13 up 1.00000 1.00000 4 ssd 0.09799 osd.4 up 1.00000 1.00000 5 ssd 0.09799 osd.5 up 1.00000 1.00000 -9 0.40999 host ceph04 14 hdd 0.10699 osd.14 up 1.00000 1.00000 15 hdd 0.10699 osd.15 up 1.00000 1.00000 6 ssd 0.09799 osd.6 up 1.00000 1.00000 7 ssd 0.09799 osd.7 up 1.00000 1.00000 root@ceph01:~# ceph osd out 1 marked out osd.1. root@ceph01:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 1.63994 root default -18 0.81995 rack rack10 -3 0.40996 host ceph01 8 hdd 0.10699 osd.8 up 1.00000 1.00000 9 hdd 0.10699 osd.9 up 1.00000 1.00000 0 ssd 0.09799 osd.0 up 1.00000 1.00000 1 ssd 0.09798 osd.1 up 0 1.00000 -5 0.40999 host ceph02 10 hdd 0.10699 osd.10 up 1.00000 1.00000 11 hdd 0.10699 osd.11 up 1.00000 1.00000 2 ssd 0.09799 osd.2 up 1.00000 1.00000 3 ssd 0.09799 osd.3 up 1.00000 1.00000 -17 0.81999 rack rack11 -7 0.40999 host ceph03 12 hdd 0.10699 osd.12 up 1.00000 1.00000 13 hdd 0.10699 osd.13 up 1.00000 1.00000 4 ssd 0.09799 osd.4 up 1.00000 1.00000 5 ssd 0.09799 osd.5 up 1.00000 1.00000 -9 0.40999 host ceph04 14 hdd 0.10699 osd.14 up 1.00000 1.00000 15 hdd 0.10699 osd.15 up 1.00000 1.00000 6 ssd 0.09799 osd.6 up 1.00000 1.00000 7 ssd 0.09799 osd.7 up 1.00000 1.00000 root@ceph01:~# ceph orch osd rm 1 Scheduled OSD(s) for removal 2020-09-10T16:29:58.176991+0200 mgr.ceph02.ouelws [INF] Removing daemon osd.1 from ceph01 2020-09-10T16:30:00.148659+0200 mgr.ceph02.ouelws [INF] Successfully removed OSD <1> on ceph01 root@ceph01:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 1.54196 root default -18 0.72197 rack rack10 -3 0.31198 host ceph01 8 hdd 0.10699 osd.8 up 1.00000 1.00000 9 hdd 0.10699 osd.9 up 1.00000 1.00000 0 ssd 0.09799 osd.0 up 1.00000 1.00000 -5 0.40999 host ceph02 10 hdd 0.10699 osd.10 up 1.00000 1.00000 11 hdd 0.10699 osd.11 up 1.00000 1.00000 2 ssd 0.09799 osd.2 up 1.00000 1.00000 3 ssd 0.09799 osd.3 up 1.00000 1.00000 -17 0.81999 rack rack11 -7 0.40999 host ceph03 12 hdd 0.10699 osd.12 up 1.00000 1.00000 13 hdd 0.10699 osd.13 up 1.00000 1.00000 4 ssd 0.09799 osd.4 up 1.00000 1.00000 5 ssd 0.09799 osd.5 up 1.00000 1.00000 -9 0.40999 host ceph04 14 hdd 0.10699 osd.14 up 1.00000 1.00000 15 hdd 0.10699 osd.15 up 1.00000 1.00000 6 ssd 0.09799 osd.6 up 1.00000 1.00000 7 ssd 0.09799 osd.7 up 1.00000 1.00000 root@ceph01:~# ceph orch device zap ceph01 /dev/sdc --force INFO:cephadm:/usr/bin/docker:stderr --> Zapping: /dev/sdc INFO:cephadm:/usr/bin/docker:stderr --> Zapping lvm member /dev/sdc. lv_path is /dev/ceph-0d19a151-30b6-459e-936a-488f143e11f6/osd-block-d5062900-abe7-413a-9d9a-d1cdda2948eb INFO:cephadm:/usr/bin/docker:stderr Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-0d19a151-30b6-459e-936a-488f143e11f6/osd-block-d5062900-abe7-413a-9d9a-d1cdda2948eb bs=1M count=10 conv=fsync INFO:cephadm:/usr/bin/docker:stderr stderr: 10+0 records in INFO:cephadm:/usr/bin/docker:stderr 10+0 records out INFO:cephadm:/usr/bin/docker:stderr stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0583658 s, 180 MB/s INFO:cephadm:/usr/bin/docker:stderr --> Only 1 LV left in VG, will proceed to destroy volume group ceph-0d19a151-30b6-459e-936a-488f143e11f6 INFO:cephadm:/usr/bin/docker:stderr Running command: /usr/sbin/vgremove -v -f ceph-0d19a151-30b6-459e-936a-488f143e11f6 INFO:cephadm:/usr/bin/docker:stderr stderr: Removing ceph--0d19a151--30b6--459e--936a--488f143e11f6-osd--block--d5062900--abe7--413a--9d9a--d1cdda2948eb (253:3) INFO:cephadm:/usr/bin/docker:stderr stderr: Archiving volume group "ceph-0d19a151-30b6-459e-936a-488f143e11f6" metadata (seqno 5). INFO:cephadm:/usr/bin/docker:stderr Releasing logical volume "osd-block-d5062900-abe7-413a-9d9a-d1cdda2948eb" INFO:cephadm:/usr/bin/docker:stderr stderr: Creating volume group backup "/etc/lvm/backup/ceph-0d19a151-30b6-459e-936a-488f143e11f6" (seqno 6). INFO:cephadm:/usr/bin/docker:stderr stdout: Logical volume "osd-block-d5062900-abe7-413a-9d9a-d1cdda2948eb" successfully removed INFO:cephadm:/usr/bin/docker:stderr stderr: Removing physical volume "/dev/sdc" from volume group "ceph-0d19a151-30b6-459e-936a-488f143e11f6" INFO:cephadm:/usr/bin/docker:stderr stdout: Volume group "ceph-0d19a151-30b6-459e-936a-488f143e11f6" successfully removed INFO:cephadm:/usr/bin/docker:stderr Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M count=10 conv=fsync INFO:cephadm:/usr/bin/docker:stderr stderr: 10+0 records in INFO:cephadm:/usr/bin/docker:stderr 10+0 records out INFO:cephadm:/usr/bin/docker:stderr stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.016043 s, 654 MB/s INFO:cephadm:/usr/bin/docker:stderr --> Zapping successful for: <Raw Device: /dev/sdc> 2020-09-10T16:31:15.951617+0200 mgr.ceph02.ouelws [INF] Zap device ceph01:/dev/sdc 2020-09-10T16:31:24.738974+0200 mgr.ceph02.ouelws [INF] Found osd claims for drivegroup SSD_OSDs -> {} 2020-09-10T16:31:24.740489+0200 mgr.ceph02.ouelws [INF] Applying SSD_OSDs on host ceph01... 2020-09-10T16:31:31.549897+0200 mgr.ceph02.ouelws [INF] Deploying daemon osd.1 on ceph01 2020-09-10T16:31:33.057061+0200 mgr.ceph02.ouelws [INF] Applying SSD_OSDs on host ceph02... 2020-09-10T16:31:33.057373+0200 mgr.ceph02.ouelws [INF] Applying SSD_OSDs on host ceph03... 2020-09-10T16:31:33.057519+0200 mgr.ceph02.ouelws [INF] Applying SSD_OSDs on host ceph04... 2020-09-10T16:31:37.569914+0200 mon.ceph01 [INF] osd.1 [v2:10.24.4.128:6810/4173467371,v1:10.24.4.128:6811/4173467371] boot 2020-09-10T16:31:46.531544+0200 mon.ceph01 [INF] Cluster is now healthy root@ceph01:~# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 1.54196 root default -18 0.72197 rack rack10 -3 0.31198 host ceph01 8 hdd 0.10699 osd.8 up 1.00000 1.00000 9 hdd 0.10699 osd.9 up 1.00000 1.00000 0 ssd 0.09799 osd.0 up 1.00000 1.00000 1 ssd 0 osd.1 up 1.00000 1.00000 -5 0.40999 host ceph02 10 hdd 0.10699 osd.10 up 1.00000 1.00000 11 hdd 0.10699 osd.11 up 1.00000 1.00000 2 ssd 0.09799 osd.2 up 1.00000 1.00000 3 ssd 0.09799 osd.3 up 1.00000 1.00000 -17 0.81999 rack rack11 -7 0.40999 host ceph03 12 hdd 0.10699 osd.12 up 1.00000 1.00000 13 hdd 0.10699 osd.13 up 1.00000 1.00000 4 ssd 0.09799 osd.4 up 1.00000 1.00000 5 ssd 0.09799 osd.5 up 1.00000 1.00000 -9 0.40999 host ceph04 14 hdd 0.10699 osd.14 up 1.00000 1.00000 15 hdd 0.10699 osd.15 up 1.00000 1.00000 6 ssd 0.09799 osd.6 up 1.00000 1.00000 7 ssd 0.09799 osd.7 up 1.00000 1.00000 Why does osd.1 have a weight of 0 now? When the OSDs had been initially deployed with the first ceph orch apply command the weights have been correctly set according to their size. Why is there a difference between this process and an OSD (re-)deployed later on? Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx