Re: Orchestrator is internally ignoring applying a spec against SSDs, apparently determining they're rotational.

Eugen Block <eblock@xxxxxx> · Wed, 06 Oct 2021 08:17:44 +0000

You should either zap the devices with

ceph orch device zap my_hostname my_path --force

or with ceph-volume directly on that host:

cephadm ceph-volume lvm zap --destroy /dev/sdX

IIRC there's a backup of the partition table at the end of the  
partition. I would expect ceph-volume to identify that those drives  
are not available but apparently they seem available?

If 4 of 5 nodes have successfully created OSDs, could you set the osd  
specs to "unmanaged: true" and then zap all OSD devices on that  
failing host again with 'ceph orch device zap'?
If it finishes successfully, could you then run this command on that  
failed OSD host and paste it here:

cephadm ceph-volume inventory

and maybe also this:

lsblk -o name,rota,size

Zitat von Chris <hagfelsh@xxxxxxxxx>:

Hi!  So I nuked the cluster, zapped all the disks, and redeployed.

Then I applied this osd spec (this time via the dashboard since I was full
of hope):

service_type: osd
service_id: osd_spec_default
placement:
  host_pattern: '*'
data_devices:
  rotational: 1
db_devices:
  rotational: 0

--dry-run showed exactly what I hoped to see.

Upon application, hosts 1-4 worked just fine.  Host 5... not so much. I see
logical volumes being created, but no OSDs are coming online.  Moreover,
it's taken cephadm on host 5 days to get just a few LVs built.

I nuked all the LV's on that host, then zapped with sgdisk, then dd'd the
drives with /dev/urandom, then rebooted... the problem persists!
cephadm started making vg/lv but no new OSDs.

This wall of text might have a hint... but it's not true!  There's no
partition on these!  They've been wiped with /dev/urandom!

Here's a dump of a relevant part of /var/log/ceph/cephadm.log.  Since
formatting is stripped, I've spaced out the interesting part.  It's a shame
this process is still so unreliable.

2021-10-05 20:43:41,499 INFO Non-zero exit code 1 from /usr/bin/docker run
--rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint
/usr/sbin/ceph-volume --privileged --group-add=disk --init -e
CONTAINER_IMAGE=
quay.io/ceph/ceph@sha256:5755c3a5c197ef186b8186212e023565f15b799f1ed411207f2c3fcd4a80ab45
-e NODE_NAME=ceph05 -e CEPH_USE_RANDOM_NONCE=1 -e
CEPH_VOLUME_OSDSPEC_AFFINITY=dashboard-admin-1633379370439 -v
/var/run/ceph/23e192fe-221d-11ec-a2cb-a16209e26d65:/var/run/ceph:z -v
/var/log/ceph/23e192fe-221d-11ec-a2cb-a16209e26d65:/var/log/ceph:z -v
/var/lib/ceph/23e192fe-221d-11ec-a2cb-a16209e26d65/crash:/var/lib/ceph/crash:z
-v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
/run/lock/lvm:/run/lock/lvm -v /tmp/ceph-tmpu5c6jw0u:/etc/ceph/ceph.conf:z
-v /tmp/ceph-tmpk1wgba4u:/var/lib/ceph/bootstrap-osd/ceph.keyring:z
quay.io/ceph/ceph@sha256:5755c3a5c197ef186b8186212e023565f15b799f1ed411207f2c3fcd4a80ab45
lvm batch --no-auto /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
/dev/sdg /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo
/dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx
--wal-devices /dev/sdp /dev/sdq --yes --no-systemd
2021-10-05 20:43:41,499 INFO /usr/bin/docker: stderr --> passed data
devices: 21 physical, 0 LVM
2021-10-05 20:43:41,499 INFO /usr/bin/docker: stderr --> relative data
size: 1.0
2021-10-05 20:43:41,499 INFO /usr/bin/docker: stderr --> passed block_wal
devices: 2 physical, 0 LVM
2021-10-05 20:43:41,500 INFO /usr/bin/docker: stderr Running command:
/usr/bin/ceph-authtool --gen-print-key
2021-10-05 20:43:41,500 INFO /usr/bin/docker: stderr Running command:
/usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
a97fda7a-586f-4ced-86e0-b0a18e081ec7
2021-10-05 20:43:41,500 INFO /usr/bin/docker: stderr Running command:
/usr/sbin/vgcreate --force --yes ceph-19158c90-90e6-4a37-98e2-7e0e45cd5e27
/dev/sdn
2021-10-05 20:43:41,500 INFO /usr/bin/docker: stderr  stdout: Physical
volume "/dev/sdn" successfully created.
2021-10-05 20:43:41,500 INFO /usr/bin/docker: stderr  stdout: Volume group
"ceph-19158c90-90e6-4a37-98e2-7e0e45cd5e27" successfully created
2021-10-05 20:43:41,500 INFO /usr/bin/docker: stderr Running command:
/usr/sbin/lvcreate --yes -l 238467 -n
osd-block-a97fda7a-586f-4ced-86e0-b0a18e081ec7
ceph-19158c90-90e6-4a37-98e2-7e0e45cd5e27
2021-10-05 20:43:41,500 INFO /usr/bin/docker: stderr  stdout: Logical
volume "osd-block-a97fda7a-586f-4ced-86e0-b0a18e081ec7" created.
2021-10-05 20:43:41,501 INFO /usr/bin/docker: stderr Running command:
/usr/sbin/vgcreate --force --yes ceph-84b7458f-4888-41a7-a6d6-031d85bfc9e4
/dev/sdp

2021-10-05 20:43:41,501 INFO /usr/bin/docker: stderr  *stderr: Cannot use
/dev/sdp: device is partitioned*

2021-10-05 20:43:41,501 INFO /usr/bin/docker: stderr   Command requires all
devices to be found.
2021-10-05 20:43:41,501 INFO /usr/bin/docker: stderr --> Was unable to
complete a new OSD, will rollback changes
2021-10-05 20:43:41,501 INFO /usr/bin/docker: stderr Running command:
/usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.84
--yes-i-really-mean-it
2021-10-05 20:43:41,501 INFO /usr/bin/docker: stderr  stderr: purged osd.84
2021-10-05 20:43:41,501 INFO /usr/bin/docker: stderr -->  RuntimeError:
command returned non-zero exit status: 5
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx