Hi Janek, Have you tried looking into the orchestrator's decisions? $ ceph config set mgr mgr/cephadm/log_to_cluster_level debug then $ ceph -W cephadm --watch-debug or look into active MGR's /var/log/ceph/$(ceph fsid)/ceph.cephadm.log Regards, Frédéric. ----- Le 10 Jan 25, à 13:53, Janek Bevendorff janek.bevendorff@xxxxxxxxxxxxx a écrit : > Hi, > > I'm having a strange problem with the orchestrator. My cluster has the > following OSD services configured based on certain attributes of the disks: > > NAME PORTS RUNNING REFRESHED AGE PLACEMENT > ... > osd.osd-default-hdd 1351 2m ago 22m label:osd;HOSTPREFIX* > osd.osd-default-ssd 0 - 22m label:osd;HOSTPREFIX* > osd.osd-small-hdd 41 2m ago 22m label:osd;HOSTPREFIX* > > These apply to three device types: large HDDs (8TB+), small HDDs > (250G-7TB), and SSDs (1TB+). I did that with the following YAML definition: > > service_type: osd > service_id: osd-default-hdd > service_name: osd.osd-default-hdd > placement: > host_pattern: HOSTPREFIX* > label: osd > spec: > crush_device_class: hdd > data_devices: > rotational: 1 > size: '8T:' > filter_logic: AND > objectstore: bluestore > osds_per_device: 1 > --- > service_type: osd > service_id: osd-default-ssd > service_name: osd.osd-default-ssd > placement: > host_pattern: HOSTPREFIX* > label: osd > spec: > crush_device_class: ssd > data_devices: > rotational: 0 > size: '1T:' > filter_logic: AND > objectstore: bluestore > osds_per_device: 1 > --- > service_type: osd > service_id: osd-small-hdd > service_name: osd.osd-small-hdd > placement: > host_pattern: HOSTPREFIX* > label: osd > spec: > crush_device_class: hdd-small > data_devices: > rotational: 1 > size: 250G:7T > filter_logic: AND > objectstore: bluestore > osds_per_device: 1 > > > Previously, this worked perfectly, but as you can see in the summary > above, now the orchestrator suddenly started to ignore the device class > and data_devices filters for SSDs and incorrectly added all SSDs to the > osd.osd-default-hdd service (weirdly enough, hdd-small still works). > > The affected devices still have the correct device class in the CRUSH > tree and it also looks like the data placement is fine. The orchestrator > service listing, however, is incorrect. I tried cleaning out and freshly > redeploying one of the SSD OSDs, but the redeployed service still has > the following in the unit.meta file: > > { > "service_name": "osd.osd-default-hdd", > "ports": [], > "ip": null, > "deployed_by": [ > "quay.io/ceph/ceph@sha256:ac06cdca6f2512a763f1ace8553330e454152b82f95a2b6bf33c3f3ec2eeac77", > "quay.io/ceph/ceph@sha256:6ac7f923aa1d23b43248ce0ddec7e1388855ee3d00813b52c3172b0b23b37906" > ], > "rank": null, > "rank_generation": null, > "extra_container_args": null, > "extra_entrypoint_args": null, > "memory_request": null, > "memory_limit": null > } > > Any idea what might be causing this? I'm on Ceph 18.2.4 (upgrade > planned, but I need to wait out some remapped PGs first). > > Janek > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx