Ceph Orchestrator ignores attribute filters for SSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm having a strange problem with the orchestrator. My cluster has the following OSD services configured based on certain attributes of the disks:

NAME                 PORTS  RUNNING  REFRESHED  AGE PLACEMENT
...
osd.osd-default-hdd            1351  2m ago     22m label:osd;HOSTPREFIX*
osd.osd-default-ssd               0  -          22m label:osd;HOSTPREFIX*
osd.osd-small-hdd                41  2m ago     22m label:osd;HOSTPREFIX*

These apply to three device types: large HDDs (8TB+), small HDDs (250G-7TB), and SSDs (1TB+). I did that with the following YAML definition:

service_type: osd
service_id: osd-default-hdd
service_name: osd.osd-default-hdd
placement:
  host_pattern: HOSTPREFIX*
  label: osd
spec:
  crush_device_class: hdd
  data_devices:
    rotational: 1
    size: '8T:'
  filter_logic: AND
  objectstore: bluestore
  osds_per_device: 1
---
service_type: osd
service_id: osd-default-ssd
service_name: osd.osd-default-ssd
placement:
  host_pattern: HOSTPREFIX*
  label: osd
spec:
  crush_device_class: ssd
  data_devices:
    rotational: 0
    size: '1T:'
  filter_logic: AND
  objectstore: bluestore
  osds_per_device: 1
---
service_type: osd
service_id: osd-small-hdd
service_name: osd.osd-small-hdd
placement:
  host_pattern: HOSTPREFIX*
  label: osd
spec:
  crush_device_class: hdd-small
  data_devices:
    rotational: 1
    size: 250G:7T
  filter_logic: AND
  objectstore: bluestore
  osds_per_device: 1


Previously, this worked perfectly, but as you can see in the summary above, now the orchestrator suddenly started to ignore the device class and data_devices filters for SSDs and incorrectly added all SSDs to the osd.osd-default-hdd service (weirdly enough, hdd-small still works).

The affected devices still have the correct device class in the CRUSH tree and it also looks like the data placement is fine. The orchestrator service listing, however, is incorrect. I tried cleaning out and freshly redeploying one of the SSD OSDs, but the redeployed service still has the following in the unit.meta file:

{
    "service_name": "osd.osd-default-hdd",
    "ports": [],
    "ip": null,
    "deployed_by": [
"quay.io/ceph/ceph@sha256:ac06cdca6f2512a763f1ace8553330e454152b82f95a2b6bf33c3f3ec2eeac77",
"quay.io/ceph/ceph@sha256:6ac7f923aa1d23b43248ce0ddec7e1388855ee3d00813b52c3172b0b23b37906"
    ],
    "rank": null,
    "rank_generation": null,
    "extra_container_args": null,
    "extra_entrypoint_args": null,
    "memory_request": null,
    "memory_limit": null
}

Any idea what might be causing this? I'm on Ceph 18.2.4 (upgrade planned, but I need to wait out some remapped PGs first).

Janek

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux