Re: cephadm bootstraps cluster with bad CRUSH map(?)

Matthew Vernon <mvernon@xxxxxxxxxxxxx> · Mon, 20 May 2024 18:01:32 +0100

Hi,

On 20/05/2024 17:29, Anthony D'Atri wrote:

On May 20, 2024, at 12:21 PM, Matthew Vernon <mvernon@xxxxxxxxxxxxx> wrote:

This has left me with a single sad pg:
[WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive
    pg 1.0 is stuck inactive for 33m, current state unknown, last acting []

.mgr pool perhaps.

I think so

ceph osd tree shows that CRUSH picked up my racks OK, eg.
-3          45.11993  rack B4
-2          45.11993      host moss-be1001
1    hdd    3.75999          osd.1             up   1.00000  1.00000

Please send the entire first 10 lines or so of `ceph osd tree`

root@moss-be1001:/# ceph osd tree
ID  CLASS  WEIGHT     TYPE NAME             STATUS  REWEIGHT  PRI-AFF
-7         176.11194  rack F3
-6         176.11194      host moss-be1003
 2    hdd    7.33800          osd.2             up   1.00000  1.00000
 3    hdd    7.33800          osd.3             up   1.00000  1.00000
 6    hdd    7.33800          osd.6             up   1.00000  1.00000
 9    hdd    7.33800          osd.9             up   1.00000  1.00000
12    hdd    7.33800          osd.12            up   1.00000  1.00000
13    hdd    7.33800          osd.13            up   1.00000  1.00000
16    hdd    7.33800          osd.16            up   1.00000  1.00000
19    hdd    7.33800          osd.19            up   1.00000  1.00000

I passed this config to bootstrap with --config:

[global]
  osd_crush_chooseleaf_type = 3

Why did you set that?  3 is an unusual value.  AIUI most of the time the only reason to change this option is if one is setting up a single-node sandbox - and perhaps localpools create a rule using it.  I suspect this is at least part of your problem.

I wanted to have rack as failure domain rather than host i.e. to ensure 
that each replica goes in a different rack (academic at the moment as I 
have 3 hosts, one in each rack, but for future expansion important).

Once the cluster was up I used an osd spec file that looked like:
service_type: osd
service_id: rrd_single_NVMe
placement:
  label: "NVMe"
spec:
  data_devices:
    rotational: 1
  db_devices:
    model: "NVMe"

Is it your intent to use spinners for payload data and SSD for metadata?

Yes.

Regards,

Matthew
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx