Hi,I'm having a strange problem with the orchestrator. My cluster has the following OSD services configured based on certain attributes of the disks:
NAME PORTS RUNNING REFRESHED AGE PLACEMENT ... osd.osd-default-hdd 1351 2m ago 22m label:osd;HOSTPREFIX* osd.osd-default-ssd 0 - 22m label:osd;HOSTPREFIX* osd.osd-small-hdd 41 2m ago 22m label:osd;HOSTPREFIX*These apply to three device types: large HDDs (8TB+), small HDDs (250G-7TB), and SSDs (1TB+). I did that with the following YAML definition:
service_type: osd service_id: osd-default-hdd service_name: osd.osd-default-hdd placement: host_pattern: HOSTPREFIX* label: osd spec: crush_device_class: hdd data_devices: rotational: 1 size: '8T:' filter_logic: AND objectstore: bluestore osds_per_device: 1 --- service_type: osd service_id: osd-default-ssd service_name: osd.osd-default-ssd placement: host_pattern: HOSTPREFIX* label: osd spec: crush_device_class: ssd data_devices: rotational: 0 size: '1T:' filter_logic: AND objectstore: bluestore osds_per_device: 1 --- service_type: osd service_id: osd-small-hdd service_name: osd.osd-small-hdd placement: host_pattern: HOSTPREFIX* label: osd spec: crush_device_class: hdd-small data_devices: rotational: 1 size: 250G:7T filter_logic: AND objectstore: bluestore osds_per_device: 1Previously, this worked perfectly, but as you can see in the summary above, now the orchestrator suddenly started to ignore the device class and data_devices filters for SSDs and incorrectly added all SSDs to the osd.osd-default-hdd service (weirdly enough, hdd-small still works).
The affected devices still have the correct device class in the CRUSH tree and it also looks like the data placement is fine. The orchestrator service listing, however, is incorrect. I tried cleaning out and freshly redeploying one of the SSD OSDs, but the redeployed service still has the following in the unit.meta file:
{ "service_name": "osd.osd-default-hdd", "ports": [], "ip": null, "deployed_by": [ "quay.io/ceph/ceph@sha256:ac06cdca6f2512a763f1ace8553330e454152b82f95a2b6bf33c3f3ec2eeac77", "quay.io/ceph/ceph@sha256:6ac7f923aa1d23b43248ce0ddec7e1388855ee3d00813b52c3172b0b23b37906" ], "rank": null, "rank_generation": null, "extra_container_args": null, "extra_entrypoint_args": null, "memory_request": null, "memory_limit": null }Any idea what might be causing this? I'm on Ceph 18.2.4 (upgrade planned, but I need to wait out some remapped PGs first).
Janek
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx