Hi,
There is no down OSD:
yeah, I see the same in my lab, it also shows more daemons than I have
for that custom spec. I think it's somehow showing also some failed
attempts to deploy OSDs. In my case it's 4/8 but I only have 8 OSDs in
total, but I changed the spec a couple of times in order to trigger
cephadm to apply it. I don't really have an answer to that.
As to why it applied the wrong spec I assume that ceph-volume in
combination with cephadm still has some flaws. I have been playing
around with cephadm with different setups, e. g. trying to change a
host's layout from standalone OSDs to OSDs with separate DBs but it
didn't really work as expected. That's one of the reasons I'm still
struggling to upgrade our production Nautilus cluster to Octopus. But
I can't really tell if I'm still having some misunderstandings about
it or if it's really still buggy, or if it's related to the virtual
environment.
Regards,
Eugen
Zitat von "[AR] Guillaume de Lafond" <gdelafond@xxxxxxxxxxx>:
Hello,
On 12 Nov 2021, at 18:03, Eugen Block <eblock@xxxxxx> wrote:
Another question is why “ ceph orch ls osd” reports in the RUNNING
column the value x/24, why 24?
can you share your 'ceph osd tree' and maybe also 'ceph -s'? I
would assume that you have a few dead or down OSDs, but it's hard
to tell.
There is no down OSD:
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 225.28284 root default
-17 28.16035 host host10
7 hdd 9.09569 osd.7 up 1.00000 1.00000
15 hdd 9.09569 osd.15 up 1.00000 1.00000
25 hdd 9.09569 osd.25 up 1.00000 1.00000
23 ssd 0.87329 osd.23 up 1.00000 1.00000
-7 28.16035 host host11
2 hdd 9.09569 osd.2 up 1.00000 1.00000
14 hdd 9.09569 osd.14 up 1.00000 1.00000
31 hdd 9.09569 osd.31 up 1.00000 1.00000
18 ssd 0.87329 osd.18 up 1.00000 1.00000
-11 28.16035 host host12
6 hdd 9.09569 osd.6 up 1.00000 1.00000
9 hdd 9.09569 osd.9 up 1.00000 1.00000
30 hdd 9.09569 osd.30 up 1.00000 1.00000
16 ssd 0.87329 osd.16 up 1.00000 1.00000
-5 28.16035 host host13
4 hdd 9.09569 osd.4 up 1.00000 1.00000
12 hdd 9.09569 osd.12 up 1.00000 1.00000
26 hdd 9.09569 osd.26 up 1.00000 1.00000
20 ssd 0.87329 osd.20 up 1.00000 1.00000
-3 28.16035 host host14
0 hdd 9.09569 osd.0 up 1.00000 1.00000
11 hdd 9.09569 osd.11 up 1.00000 1.00000
29 hdd 9.09569 osd.29 up 1.00000 1.00000
17 ssd 0.87329 osd.17 up 1.00000 1.00000
-15 28.16035 host host15
3 hdd 9.09569 osd.3 up 1.00000 1.00000
10 hdd 9.09569 osd.10 up 1.00000 1.00000
28 hdd 9.09569 osd.28 up 1.00000 1.00000
21 ssd 0.87329 osd.21 up 1.00000 1.00000
-13 28.16035 host host16
1 hdd 9.09569 osd.1 up 1.00000 1.00000
8 hdd 9.09569 osd.8 up 1.00000 1.00000
24 hdd 9.09569 osd.24 up 1.00000 1.00000
22 ssd 0.87329 osd.22 up 1.00000 1.00000
-9 28.16035 host host17
5 hdd 9.09569 osd.5 up 1.00000 1.00000
13 hdd 9.09569 osd.13 up 1.00000 1.00000
27 hdd 9.09569 osd.27 up 1.00000 1.00000
19 ssd 0.87329 osd.19 up 1.00000 1.00000
# ceph -s
cluster:
id: 58452b76-e3cc-11eb-b895-2132fd5f9203
health: HEALTH_WARN
158 pgs not deep-scrubbed in time
services:
mon: 5 daemons, quorum host10,host11,host12,host13,host14 (age 11d)
mgr: host12.rwmuiw(active, since 11d), standbys: host13.jennry,
host14.xlexye, host10.pknkwk, host11.mfhlwn
osd: 32 osds: 32 up (since 4h), 32 in (since 2d); 7 remapped pgs
rgw: 24 daemons active (8 hosts, 3 zones)
data:
pools: 24 pools, 961 pgs
objects: 24.69M objects, 41 TiB
usage: 63 TiB used, 162 TiB / 225 TiB avail
pgs: 2410903/132768492 objects misplaced (1.816%)
954 active+clean
7 active+remapped+backfilling
io:
client: 1.6 MiB/s rd, 887 KiB/s wr, 679 op/s rd, 73 op/s wr
recovery: 19 MiB/s, 36 objects/s
1/ see which disk are in each OSD service_id?
You can see that in the output of
cephadm ceph-volume lvm list
Ok thank you.
$ ansible -i inventory -m shell -a "cephadm ceph-volume lvm list
--format json 2>/dev/null | jq -r '. | keys[] as \$k | \"osd
\(\$k): \(.[\$k] | .[] | .devices[]) \(.[\$k] | .[] | .tags |
.\"ceph.osdspec_affinity\")\"'" ceph_nodes| grep spec | sort -k 2 -n
osd 0: /dev/sdd ar_osd_hdd_spec
osd 1: /dev/sdd ar_osd_hdd_spec
osd 2: /dev/sdd ar_osd_hdd_spec
osd 3: /dev/sdd ar_osd_hdd_spec
osd 4: /dev/sdd ar_osd_hdd_spec
osd 5: /dev/sdd ar_osd_hdd_spec
osd 6: /dev/sdd ar_osd_hdd_spec
osd 7: /dev/sdd ar_osd_hdd_spec
osd 8: /dev/sde ar_osd_hdd_spec
osd 9: /dev/sde ar_osd_hdd_spec
osd 10: /dev/sde ar_osd_hdd_spec
osd 11: /dev/sde ar_osd_hdd_spec
osd 12: /dev/sde ar_osd_hdd_spec
osd 13: /dev/sde ar_osd_hdd_spec
osd 14: /dev/sde ar_osd_hdd_spec
osd 15: /dev/sde ar_osd_hdd_spec
osd 16: /dev/sdc ar_osd_ssd_spec
osd 17: /dev/sdc ar_osd_ssd_spec
osd 18: /dev/sdc ar_osd_ssd_spec
osd 19: /dev/sdc ar_osd_ssd_spec
osd 20: /dev/sdc ar_osd_ssd_spec
osd 21: /dev/sdc ar_osd_ssd_spec
osd 22: /dev/sdc ar_osd_ssd_spec
osd 23: /dev/sdc ar_osd_ssd_spec
osd 24: /dev/sdf ar_osd_hdd_spec
osd 25: /dev/sdf ar_osd_hdd_spec
osd 26: /dev/sdf ar_osd_hdd_spec
osd 27: /dev/sdf ar_osd_hdd_spec
osd 28: /dev/sdf ar_osd_hdd_spec
osd 29: /dev/sdf ar_osd_hdd_spec
osd 30: /dev/sdf ar_osd_hdd_spec
osd 31: /dev/sdf ar_osd_hdd_spec
=> 24 ar_osd_hdd_spec
=> 8 ar_osd_ssd_spec
That seems OK!
So why "ceph orch ls osd” reports 16/24 for both osd.ar_osd_hdd_spec
and osd.ar_osd_ssd_spec?
Do I miss something or is it a cephadm bug?
Regards,
—
Guillaume de Lafond
Aqua Ray
Zitat von "[AR] Guillaume CephML" <gdelafond+cephml@xxxxxxxxxxx>:
Hello,
I got something strange on a Pacific (16.2.6) cluster.
I have added 8 new empty spinning disk on this running cluster
that is configured with:
# ceph orch ls osd --export
service_type: osd
service_id: ar_osd_hdd_spec
service_name: osd.ar_osd_hdd_spec
placement:
host_pattern: '*'
spec:
data_devices:
rotational: 1
filter_logic: AND
objectstore: bluestore
---
service_type: osd
service_id: ar_osd_ssd_spec
service_name: osd.ar_osd_ssd_spec
placement:
host_pattern: '*'
spec:
data_devices:
rotational: 0
filter_logic: AND
objectstore: bluestore
Before adding them I had:
# ceph orch ls osd
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
osd.ar_osd_hdd_spec 16/24 8m ago 4M *
osd.ar_osd_ssd_spec 8/16 8m ago 4M *
After adding the disk I have:
# ceph orch ls osd
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
osd.ar_osd_hdd_spec 16/24 8m ago 4M *
osd.ar_osd_ssd_spec 16/24 8m ago 4M *
I do not understand why the disk have been detected as osd.ar_osd_ssd_spec.
New disk are on /dev/sdf.
# ceph orch device ls —wide
Hostname Path Type Transport RPM Vendor Model
Size Health Ident Fault Avail Reject Reasons
host10 /dev/sdc ssd ATA/SATA Unknown ATA
Micron_5300_MTFD 960G Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host10 /dev/sdd hdd ATA/SATA 7200 ATA HGST
HUH721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host10 /dev/sde hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host10 /dev/sdf hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host11 /dev/sdc ssd ATA/SATA Unknown ATA
Micron_5300_MTFD 960G Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host11 /dev/sdd hdd ATA/SATA 7200 ATA HGST
HUH721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host11 /dev/sde hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host11 /dev/sdf hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host12 /dev/sdc ssd ATA/SATA Unknown ATA
Micron_5300_MTFD 960G Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host12 /dev/sdd hdd ATA/SATA 7200 ATA HGST
HUH721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host12 /dev/sde hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host12 /dev/sdf hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host13 /dev/sdc ssd ATA/SATA Unknown ATA
Micron_5300_MTFD 960G Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host13 /dev/sdd hdd ATA/SATA 7200 ATA HGST
HUH721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host13 /dev/sde hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host13 /dev/sdf hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host14 /dev/sdc ssd ATA/SATA Unknown ATA
Micron_5300_MTFD 960G Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host14 /dev/sdd hdd ATA/SATA 7200 ATA HGST
HUH721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host14 /dev/sde hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host14 /dev/sdf hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host15 /dev/sdc ssd ATA/SATA Unknown ATA
Micron_5300_MTFD 960G Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host15 /dev/sdd hdd ATA/SATA 7200 ATA HGST
HUH721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host15 /dev/sde hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host15 /dev/sdf hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host16 /dev/sdc ssd ATA/SATA Unknown ATA
Micron_5300_MTFD 960G Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host16 /dev/sdd hdd ATA/SATA 7200 ATA HGST
HUH721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host16 /dev/sde hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host16 /dev/sdf hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host17 /dev/sdc ssd ATA/SATA Unknown ATA
Micron_5300_MTFD 960G Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host17 /dev/sdd hdd ATA/SATA 7200 ATA HGST
HUH721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host17 /dev/sde hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
host17 /dev/sdf hdd ATA/SATA 7200 ATA WDC
WUS721010AL 10.0T Good N/A N/A No Insufficient
space (<10 extents) on vgs, LVM detected, locked
# for f in /sys/block/sd[cdef]/queue/rotational; do printf "$f is
"; cat $f; done
/sys/block/sdc/queue/rotational is 0
/sys/block/sdd/queue/rotational is 1
/sys/block/sde/queue/rotational is 1
/sys/block/sdf/queue/rotational is 1
Is there a way to :
1/ see which disk are in each OSD service_id?
2/ move a disk from one service_id to another one?
Another question is why “ ceph orch ls osd” reports in the RUNNING
column the value x/24, why 24?
Each server has (8 servers in the cluster):
# ceph-volume inventory
Device Path Size rotates available Model name
/dev/sda 59.00 GB False False SuperMicro SSD
/dev/sdb 59.00 GB False False SuperMicro SSD
/dev/sdc 894.25 GB False False Micron_5300_MTFD
/dev/sdd 9.10 TB True False HGST HUH721010AL
/dev/sde 9.10 TB True False WDC WUS721010AL
/dev/sdf 9.10 TB True False WDC WUS721010AL
PS: of course this is not a big problem as the 2 specs are equal,
but I did not understand why it did that
PS2: on another ceph 16.2.6 cluster that have the same
service_spec, we did not get the same strange thing: the disk have
been linked to the right service_spec.
Thank you,
--
Guillaume de Lafond
Aqua Ray
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx