Re: [Pacific] OSD Spec problem?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

There is no down OSD:

yeah, I see the same in my lab, it also shows more daemons than I have for that custom spec. I think it's somehow showing also some failed attempts to deploy OSDs. In my case it's 4/8 but I only have 8 OSDs in total, but I changed the spec a couple of times in order to trigger cephadm to apply it. I don't really have an answer to that. As to why it applied the wrong spec I assume that ceph-volume in combination with cephadm still has some flaws. I have been playing around with cephadm with different setups, e. g. trying to change a host's layout from standalone OSDs to OSDs with separate DBs but it didn't really work as expected. That's one of the reasons I'm still struggling to upgrade our production Nautilus cluster to Octopus. But I can't really tell if I'm still having some misunderstandings about it or if it's really still buggy, or if it's related to the virtual environment.

Regards,
Eugen


Zitat von "[AR] Guillaume de Lafond" <gdelafond@xxxxxxxxxxx>:

Hello,

On 12 Nov 2021, at 18:03, Eugen Block <eblock@xxxxxx> wrote:
Another question is why “ ceph orch ls osd” reports in the RUNNING column the value x/24, why 24?
can you share your 'ceph osd tree' and maybe also 'ceph -s'? I would assume that you have a few dead or down OSDs, but it's hard to tell.

There is no down OSD:
# ceph osd tree
ID   CLASS  WEIGHT     TYPE NAME        STATUS  REWEIGHT  PRI-AFF
 -1         225.28284  root default
-17          28.16035      host host10
  7    hdd    9.09569          osd.7        up   1.00000  1.00000
 15    hdd    9.09569          osd.15       up   1.00000  1.00000
 25    hdd    9.09569          osd.25       up   1.00000  1.00000
 23    ssd    0.87329          osd.23       up   1.00000  1.00000
 -7          28.16035      host host11
  2    hdd    9.09569          osd.2        up   1.00000  1.00000
 14    hdd    9.09569          osd.14       up   1.00000  1.00000
 31    hdd    9.09569          osd.31       up   1.00000  1.00000
 18    ssd    0.87329          osd.18       up   1.00000  1.00000
-11          28.16035      host host12
  6    hdd    9.09569          osd.6        up   1.00000  1.00000
  9    hdd    9.09569          osd.9        up   1.00000  1.00000
 30    hdd    9.09569          osd.30       up   1.00000  1.00000
 16    ssd    0.87329          osd.16       up   1.00000  1.00000
 -5          28.16035      host host13
  4    hdd    9.09569          osd.4        up   1.00000  1.00000
 12    hdd    9.09569          osd.12       up   1.00000  1.00000
 26    hdd    9.09569          osd.26       up   1.00000  1.00000
 20    ssd    0.87329          osd.20       up   1.00000  1.00000
 -3          28.16035      host host14
  0    hdd    9.09569          osd.0        up   1.00000  1.00000
 11    hdd    9.09569          osd.11       up   1.00000  1.00000
 29    hdd    9.09569          osd.29       up   1.00000  1.00000
 17    ssd    0.87329          osd.17       up   1.00000  1.00000
-15          28.16035      host host15
  3    hdd    9.09569          osd.3        up   1.00000  1.00000
 10    hdd    9.09569          osd.10       up   1.00000  1.00000
 28    hdd    9.09569          osd.28       up   1.00000  1.00000
 21    ssd    0.87329          osd.21       up   1.00000  1.00000
-13          28.16035      host host16
  1    hdd    9.09569          osd.1        up   1.00000  1.00000
  8    hdd    9.09569          osd.8        up   1.00000  1.00000
 24    hdd    9.09569          osd.24       up   1.00000  1.00000
 22    ssd    0.87329          osd.22       up   1.00000  1.00000
 -9          28.16035      host host17
  5    hdd    9.09569          osd.5        up   1.00000  1.00000
 13    hdd    9.09569          osd.13       up   1.00000  1.00000
 27    hdd    9.09569          osd.27       up   1.00000  1.00000
 19    ssd    0.87329          osd.19       up   1.00000  1.00000

# ceph -s
  cluster:
    id:     58452b76-e3cc-11eb-b895-2132fd5f9203
    health: HEALTH_WARN
            158 pgs not deep-scrubbed in time

  services:
    mon: 5 daemons, quorum host10,host11,host12,host13,host14 (age 11d)
mgr: host12.rwmuiw(active, since 11d), standbys: host13.jennry, host14.xlexye, host10.pknkwk, host11.mfhlwn
    osd: 32 osds: 32 up (since 4h), 32 in (since 2d); 7 remapped pgs
    rgw: 24 daemons active (8 hosts, 3 zones)

  data:
    pools:   24 pools, 961 pgs
    objects: 24.69M objects, 41 TiB
    usage:   63 TiB used, 162 TiB / 225 TiB avail
    pgs:     2410903/132768492 objects misplaced (1.816%)
             954 active+clean
             7   active+remapped+backfilling

  io:
    client:   1.6 MiB/s rd, 887 KiB/s wr, 679 op/s rd, 73 op/s wr
    recovery: 19 MiB/s, 36 objects/s



1/ see which disk are in each OSD service_id?
You can see that in the output of
cephadm ceph-volume lvm list

Ok thank you.

$ ansible -i inventory -m shell -a "cephadm ceph-volume lvm list --format json 2>/dev/null | jq -r '. | keys[] as \$k | \"osd \(\$k): \(.[\$k] | .[] | .devices[]) \(.[\$k] | .[] | .tags | .\"ceph.osdspec_affinity\")\"'" ceph_nodes| grep spec | sort -k 2 -n
osd 0: /dev/sdd ar_osd_hdd_spec
osd 1: /dev/sdd ar_osd_hdd_spec
osd 2: /dev/sdd ar_osd_hdd_spec
osd 3: /dev/sdd ar_osd_hdd_spec
osd 4: /dev/sdd ar_osd_hdd_spec
osd 5: /dev/sdd ar_osd_hdd_spec
osd 6: /dev/sdd ar_osd_hdd_spec
osd 7: /dev/sdd ar_osd_hdd_spec
osd 8: /dev/sde ar_osd_hdd_spec
osd 9: /dev/sde ar_osd_hdd_spec
osd 10: /dev/sde ar_osd_hdd_spec
osd 11: /dev/sde ar_osd_hdd_spec
osd 12: /dev/sde ar_osd_hdd_spec
osd 13: /dev/sde ar_osd_hdd_spec
osd 14: /dev/sde ar_osd_hdd_spec
osd 15: /dev/sde ar_osd_hdd_spec
osd 16: /dev/sdc ar_osd_ssd_spec
osd 17: /dev/sdc ar_osd_ssd_spec
osd 18: /dev/sdc ar_osd_ssd_spec
osd 19: /dev/sdc ar_osd_ssd_spec
osd 20: /dev/sdc ar_osd_ssd_spec
osd 21: /dev/sdc ar_osd_ssd_spec
osd 22: /dev/sdc ar_osd_ssd_spec
osd 23: /dev/sdc ar_osd_ssd_spec
osd 24: /dev/sdf ar_osd_hdd_spec
osd 25: /dev/sdf ar_osd_hdd_spec
osd 26: /dev/sdf ar_osd_hdd_spec
osd 27: /dev/sdf ar_osd_hdd_spec
osd 28: /dev/sdf ar_osd_hdd_spec
osd 29: /dev/sdf ar_osd_hdd_spec
osd 30: /dev/sdf ar_osd_hdd_spec
osd 31: /dev/sdf ar_osd_hdd_spec

=> 24 ar_osd_hdd_spec
=> 8 ar_osd_ssd_spec

That seems OK!

So why "ceph orch ls osd” reports 16/24 for both osd.ar_osd_hdd_spec and osd.ar_osd_ssd_spec?
Do I miss something or is it a cephadm bug?

Regards,
—
Guillaume de Lafond
Aqua Ray

Zitat von "[AR] Guillaume CephML" <gdelafond+cephml@xxxxxxxxxxx>:

Hello,

I got something strange on a Pacific (16.2.6) cluster.
I have added 8 new empty spinning disk on this running cluster that is configured with:

# ceph orch ls osd --export
service_type: osd
service_id: ar_osd_hdd_spec
service_name: osd.ar_osd_hdd_spec
placement:
host_pattern: '*'
spec:
data_devices:
  rotational: 1
filter_logic: AND
objectstore: bluestore
---
service_type: osd
service_id: ar_osd_ssd_spec
service_name: osd.ar_osd_ssd_spec
placement:
host_pattern: '*'
spec:
data_devices:
  rotational: 0
filter_logic: AND
objectstore: bluestore


Before adding them I had:
#  ceph orch ls osd
NAME                 PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
osd.ar_osd_hdd_spec           16/24  8m ago     4M   *
osd.ar_osd_ssd_spec           8/16   8m ago     4M   *

After adding the disk I have:
#  ceph orch ls osd
NAME                 PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
osd.ar_osd_hdd_spec           16/24  8m ago     4M   *
osd.ar_osd_ssd_spec           16/24  8m ago     4M   *

I do not understand why the disk have been detected as osd.ar_osd_ssd_spec.
New disk are on /dev/sdf.

# ceph orch device ls —wide
Hostname Path Type Transport RPM Vendor Model Size Health Ident Fault Avail Reject Reasons host10 /dev/sdc ssd ATA/SATA Unknown ATA Micron_5300_MTFD 960G Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host10 /dev/sdd hdd ATA/SATA 7200 ATA HGST HUH721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host10 /dev/sde hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host10 /dev/sdf hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host11 /dev/sdc ssd ATA/SATA Unknown ATA Micron_5300_MTFD 960G Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host11 /dev/sdd hdd ATA/SATA 7200 ATA HGST HUH721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host11 /dev/sde hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host11 /dev/sdf hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host12 /dev/sdc ssd ATA/SATA Unknown ATA Micron_5300_MTFD 960G Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host12 /dev/sdd hdd ATA/SATA 7200 ATA HGST HUH721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host12 /dev/sde hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host12 /dev/sdf hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host13 /dev/sdc ssd ATA/SATA Unknown ATA Micron_5300_MTFD 960G Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host13 /dev/sdd hdd ATA/SATA 7200 ATA HGST HUH721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host13 /dev/sde hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host13 /dev/sdf hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host14 /dev/sdc ssd ATA/SATA Unknown ATA Micron_5300_MTFD 960G Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host14 /dev/sdd hdd ATA/SATA 7200 ATA HGST HUH721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host14 /dev/sde hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host14 /dev/sdf hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host15 /dev/sdc ssd ATA/SATA Unknown ATA Micron_5300_MTFD 960G Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host15 /dev/sdd hdd ATA/SATA 7200 ATA HGST HUH721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host15 /dev/sde hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host15 /dev/sdf hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host16 /dev/sdc ssd ATA/SATA Unknown ATA Micron_5300_MTFD 960G Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host16 /dev/sdd hdd ATA/SATA 7200 ATA HGST HUH721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host16 /dev/sde hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host16 /dev/sdf hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host17 /dev/sdc ssd ATA/SATA Unknown ATA Micron_5300_MTFD 960G Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host17 /dev/sdd hdd ATA/SATA 7200 ATA HGST HUH721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host17 /dev/sde hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked host17 /dev/sdf hdd ATA/SATA 7200 ATA WDC WUS721010AL 10.0T Good N/A N/A No Insufficient space (<10 extents) on vgs, LVM detected, locked

# for f in /sys/block/sd[cdef]/queue/rotational; do printf "$f is "; cat $f; done
/sys/block/sdc/queue/rotational is 0
/sys/block/sdd/queue/rotational is 1
/sys/block/sde/queue/rotational is 1
/sys/block/sdf/queue/rotational is 1

Is there a way to :
1/ see which disk are in each OSD service_id?
2/ move a disk from one service_id to another one?

Another question is why “ ceph orch ls osd” reports in the RUNNING column the value x/24, why 24?
Each server has (8 servers in the cluster):
# ceph-volume inventory
Device Path               Size         rotates available Model name
/dev/sda                  59.00 GB     False   False     SuperMicro SSD
/dev/sdb                  59.00 GB     False   False     SuperMicro SSD
/dev/sdc                  894.25 GB    False   False     Micron_5300_MTFD
/dev/sdd                  9.10 TB      True    False     HGST HUH721010AL
/dev/sde                  9.10 TB      True    False     WDC  WUS721010AL
/dev/sdf                  9.10 TB      True    False     WDC  WUS721010AL

PS: of course this is not a big problem as the 2 specs are equal, but I did not understand why it did that PS2: on another ceph 16.2.6 cluster that have the same service_spec, we did not get the same strange thing: the disk have been linked to the right service_spec.

Thank you,
--
Guillaume de Lafond
Aqua Ray

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux