Re: cephadm bootstraps cluster with bad CRUSH map(?)

Matthew Vernon <mvernon@xxxxxxxxxxxxx> · Tue, 21 May 2024 14:33:12 +0100

Hi,

Returning to this, it looks like the issue wasn't to do with how 
osd_crush_chooseleaf_type ; I destroyed and re-created my cluster as 
before, and I have the same problem again:

    pg 1.0 is stuck inactive for 10m, current state unknown, last acting []

as before, ceph osd tree:

root@moss-be1001:/# ceph osd tree
ID  CLASS  WEIGHT     TYPE NAME             STATUS  REWEIGHT  PRI-AFF
-7         176.11194  rack F3
-6         176.11194      host moss-be1003
13    hdd    7.33800          osd.13            up   1.00000  1.00000
15    hdd    7.33800          osd.15            up   1.00000  1.00000

And checking the crushmap, the default bucket is again empty:

root default {
        id -1           # do not change unnecessarily
        id -14 class hdd                # do not change unnecessarily
        # weight 0.00000
        alg straw2
        hash 0  # rjenkins1
}

[by way of confirming that I didn't accidentally leave the old config 
fragment lying around, the replication rule has:
        step chooseleaf firstn 0 type host
]

So it looks like setting location: in my spec is breaking the cluster 
bootstrap - the hosts aren't put into default, but neither are the 
declared racks. As a reminder, that spec has host entries like:

service_type: host
hostname: moss-be1003
addr: 10.64.136.22
location:
  rack: F3
labels:
  - _admin
  - NVMe

Is this expected behaviour? Presumably I can fix the cluster by using 
"ceph osd crush move F3 root=default" and similar for the others, but is 
there a way to have what I want done by cephadm bootstrap?

Thanks,

Matthew
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx