I'm afraid the parameter mgr
mgr/cephadm/default_cephadm_command_timeout is buggy.
Once not on default anymore, MGR is preparing the parameter a bit
(e.g. substracting 5 secs)
And there making it float, but cephadm is not having it (not even if
I try the default 900 myself):
[WRN] CEPHADM_APPLY_SPEC_FAIL: Failed to apply 1 service(s):
osd.all-available-devices
osd.all-available-devices: cephadm exited with an error code: 2,
stderr:usage:
cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b
[-h] [--image IMAGE] [--docker] [--data-dir DATA_DIR]
[--log-dir LOG_DIR] [--logrotate-dir LOGROTATE_DIR]
[--sysctl-dir SYSCTL_DIR] [--unit-dir UNIT_DIR] [--verbose]
[--timeout TIMEOUT] [--retry RETRY] [--env ENV] [--no-container-init]
[--no-cgroups-split]
{version,pull,inspect-image,ls,list-networks,adopt,rm-daemon,rm-cluster,run,shell,enter,ceph-volume,zap-osds,unit,logs,bootstrap,deploy,check-host,prepare-host,add-repo,rm-repo,install,registry-login,gather-facts,host-maintenance,agent,disk-rescan}
...
cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b:
error: argument --timeout: invalid int value: '895.0'
This also let to a status panic spiral reporting plenty of host and
services missing failing (I assume orch failing due to cephadm
complaining about the parameter).
I got it under control by removing the parameter again from the
config (ceph config rm mgr
mgr/cephadm/default_cephadm_command_timeout).
And the restarting all MGRs manually (systemctl restart..., again
since orch was kinda useless at this stage).
Anyhow, is there any other way I can adapt this parameter?
Or maybe look into speeding up LV creation (if this is the bootleneck)?
Thanks a lot,
Mathias
-----Original Message-----
From: Kuhring, Mathias <mathias.kuhring@xxxxxxxxxxxxxx>
Sent: Friday, March 22, 2024 5:38 PM
To: Eugen Block <eblock@xxxxxx>; ceph-users@xxxxxxx
Subject: Re: [ext] Re: cephadm auto disk preparation
and OSD installation incomplete
Hey Eugen,
Thank you for the quick reply.
The 5 missing disks on the one host were completely installed after
I fully cleaned them up as I described.
So, seems a smaller number of disks can make it.
Regarding the other host with 40 disks:
Failing the MGR didn't have any effect.
There are nor errors in `/var/log/ceph/cephadm.log`.
But a bunch of repeating image listings like:
cephadm --image
quay.io/ceph/ceph@sha256:1fb108217b110c01c480e32d0cfea0e19955733537af7bb8cbae165222496e09 --timeout 895
ls
But `ceph log last 200 debug cephadm` gave me a bunch of interesting
errors (Excerpt below. Is there any preferred method to provide
bigger logs?).
So, there are some timeouts, which might play into the assumption
that ceph-volume is a bit overwhelmed by the number of disks.
Shy assumption, but maybe LV creation is taking way too long (is
cephadm waiting for all of them in bulk?) and times out with the
default 900 secs.
However, LVs are created and cephadm will not consider them next
round ("has a filesystem").
I'm testing this theory right now by bumping up the limit to 2 hours
(and the restart with "fresh" disks again):
ceph config set mgr mgr/cephadm/default_cephadm_command_timeout 7200
However, there are also mentions of the host being not reachable:
"Unable to reach remote host ceph-3-11"
But this seems to be limited to cephadm / ceph orch, so basically
MGR but not the rest of the cluster (i.e. MONs, OSDs, etc. are
communicating happily, as far as I can tell).
During my fresh run, I do notice more hosts being apperently down:
0|0[root@ceph-3-10 ~]# ceph orch host ls | grep Offline
ceph-3-7 172.16.62.38 rgw,osd,_admin Offline
ceph-3-10 172.16.62.41 rgw,osd,_admin,prometheus Offline
ceph-3-11 172.16.62.43 rgw,osd,_admin Offline
osd-mirror-2 172.16.62.23 rgw,osd,_admin Offline
osd-mirror-3 172.16.62.24 rgw,osd,_admin Offline
But I wonder if this just a side effect of the MGR (cephadm/orch)
being too busy/overwhelmed with e.g. deploying the new OSDs.
I will update you once the next round is done or failed.
Best Wishes,
Mathias
ceph log last 200 debug cephadm
...
2024-03-20T09:19:24.917834+0000 mgr.osd-mirror-4.dkzbkw
(mgr.339518816) 82122 : cephadm [INF] Detected new or changed
devices on ceph-3-11
2024-03-20T09:34:28.877718+0000 mgr.osd-mirror-4.dkzbkw
(mgr.339518816) 83339 : cephadm [ERR] Failed to apply
osd.all-available-devices spec
DriveGroupSpec.from_json(yaml.safe_load('''service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
placement:
host_pattern: '*'
spec:
data_devices:
all: true
filter_logic: AND
objectstore: bluestore
''')): Command timed out on host cephadm deploy (osd daemon)
(default 900 second timeout) ...
raise TimeoutError()
concurrent.futures._base.TimeoutError
During handling of the above exception, another exception occurred:
...
orchestrator._interface.OrchestratorError: Command timed out on host
cephadm deploy (osd daemon) (default 900 second timeout)
2024-03-20T09:34:28.881472+0000 mgr.osd-mirror-4.dkzbkw
(mgr.339518816) 83340 : cephadm [ERR] Task exception was never
retrieved
future: <Task finished
coro=<OSDService.create_from_spec.<locals>.all_hosts() done, defined
at /usr/share/ceph/mgr/cephadm/services/osd.py:72>
exception=RuntimeError('cephadm exited with an error code: 1,
stderr:Unable to reach remote host ceph-3-11. ',)> Traceback (most
recent call last):
File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 75, in all_hosts
return await gather(*futures)
File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 64, in
create_from_spec_one
replace_osd_ids=osd_id_claims_for_host, env_vars=env_vars
File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 96, in
create_single_host
code, '\n'.join(err)))
RuntimeError: cephadm exited with an error code: 1, stderr:Unable to
reach remote host ceph-3-11.
...
''')): cephadm exited with an error code: 1, stderr:Inferring config
/var/lib/ceph/7efa00f9-182f-40f4-9136-d51895db1f0b/config/ceph.conf
Non-zero exit code 1 from /usr/bin/docker run --rm --ipc=host
--stop-signal=SIGTERM --ulimit nofile=1048576 --net=host
--entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk
--init -e
CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:1fb108217b110c01c480e32d0cfea0e19955733537af7bb8cbae165222496e09 -e NODE_NAME=ceph-3-11 -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_OSDSPEC_AFFINITY=all-available-devices -e CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/7efa00f9-182f-40f4-9136-d51895db1f0b:/var/run/ceph:z -v /var/log/ceph/7efa00f9-182f-40f4-9136-d51895db1f0b:/var/log/ceph:z -v /var/lib/ceph/7efa00f9-182f-40f4-9136-d51895db1f0b/crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmpfoibulv3:/etc/ceph/ceph.conf:z -v /tmp/ceph-tmpjq5uxhj1:/var/lib/ceph/bootstrap-osd/ceph.keyring:z
quay.io/ceph/ceph@sha256:1fb108217b110c01c480e32d0cfea0e19955733537af7bb8cbae16522249
6e09 lvm batch --no-auto /dev/sdm /dev/sdn /dev/sdo /dev/sdp
/dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw
/dev/sdx /dev/sdy /dev/sdz --yes --no-systemd ...
/usr/bin/docker: stderr raise RuntimeError("Device {} has a
filesystem.".format(self.dev_path))
/usr/bin/docker: stderr RuntimeError: Device /dev/sdm has a filesystem.
...
RuntimeError: cephadm exited with an error code: 1, stderr:Inferring
config
/var/lib/ceph/7efa00f9-182f-40f4-9136-d51895db1f0b/config/ceph.conf
...
-----Original Message-----
From: Eugen Block <eblock@xxxxxx>
Sent: Thursday, March 21, 2024 3:28 PM
To: ceph-users@xxxxxxx
Subject: [ext] Re: cephadm auto disk preparation and
OSD installation incomplete
Hi,
before getting into that the first thing I would do is to fail the
mgr. There have been too many issues where failing over the mgr
resolved many of them.
If that doesn't help, the cephadm.log should show something useful
(/var/log/ceph/cephadm.log on the OSD hosts, I'm still not too
familiar with the whole 'ceph log last 200 debug cephadm' thing).
I remember reports in earlier versions of ceph-volume (probably
pre-cephadm) where not all OSDs were created if the host had many
disks to deploy. But I can't find those threads right now.
And it's strange that on the second cluster no OSD is created at
all, but again, maybe fail the mgr first before looking deeper into
it.
Regards,
Eugen
Zitat von "Kuhring, Mathias" <mathias.kuhring@xxxxxxxxxxxxxx>:
Dear ceph community,
We have trouble with new disks not being properly prepared resp.
OSDs not being fully installed by cephadm.
We just added one new node each with ~40 HDDs each to two of our ceph
clusters.
In one cluster all but 5 disks got installed automatically.
In the other none got installed.
We are on ceph version 17.2.7
(b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) on both
clusters.
(I haven't added new disks since the last upgrade if I recall correctly).
This is our OSD service definition:
```
0|0[root@ceph-3-10 ~]# ceph orch ls osd --export
service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
placement:
host_pattern: '*'
spec:
data_devices:
all: true
filter_logic: AND
objectstore: bluestore
---
service_type: osd
service_id: unmanaged
service_name: osd.unmanaged
unmanaged: true
spec:
filter_logic: AND
objectstore: bluestore
```
Usually, new disks are installed properly (as expected due to
all-available-devices).
This time, I can see that LVs were created (via `lsblk`, `lvs`,
`cephadm ceph-volume lvm list`).
And OSDs are entered to the crushmap.
However, they are not assigned to a host yet, nor do they have a type
or weight, e.g.:
```
0|0[root@ceph-2-10 ~]# ceph osd tree | grep "0 osd"
518 0 osd.518 down 0 1.00000
519 0 osd.519 down 0 1.00000
520 0 osd.520 down 0 1.00000
521 0 osd.521 down 0 1.00000
522 0 osd.522 down 0 1.00000
```
And there is also no OSD daemon created (no docker container).
So, OSD creation is somehow stuck halfway.
I thought of fully cleaning up the OSD/disks.
Hopping cephadm might pick them up properly next time.
Just zapping was not possible, e.g. `cephadm ceph-volume lvm zap
--destroy /dev/sdab` results in these errors:
```
/usr/bin/docker: stderr stderr: wipefs: error: /dev/sdab: probing
initialization failed: Device or resource busy
/usr/bin/docker: stderr --> failed to wipefs device, will try again to
workaround probable race condition ```
So, I cleaned up more manually with purging them from crush and
"resetting" disk and LV with dd and dmsetup, resp.:
```
ceph osd purge 480 --force
dd if=/dev/zero of=/dev/sdab bs=1M count=1 dmsetup remove
ceph--e10e0f08--8705--441a--8caa--4590de22a611-osd--block--d464211c--f
513--4513--86c1--c7ad63e6c142
```
ceph-volume still reported the old volumes, but then zapping actually
got rid of them (only cleaned out the left-over entries, I guess).
Now, cephadm was able to get one OSD up, when I did this cleanup for
only one disk.
When I did it in bulk for the rest, they all got stuck again the same way.
Looking into ceph-volume logs (here for osd.522 as representative):
```
0|0[root@ceph-2-11
/var/log/ceph/55633ec3-6c0c-4a02-990c-0f87e0f7a01f]# ll *20240316
-rw-r--r-- 1 ceph ceph 613789 Mar 14 17:10 ceph-osd.522.log-20240316
-rw-r--r-- 1 root root 42473553 Mar 16 03:13 ceph-volume.log-20240316
```
ceph-volume only reports keyring creation:
```
[2024-03-14 16:10:19,509][ceph_volume.util.prepare][INFO ] Creating
keyring file for osd.522
[2024-03-14 16:10:19,510][ceph_volume.process][INFO ] Running
command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-522/keyring
--create-keyring --name osd.522 --add-key
AQBfIfNlinc7EBAAHeFicrjmLEjRPGSjuFuLiQ==
```
In the OSD logs I found a couple of these, but don't know if they are
related:
```
2024-03-14T16:10:54.706+0000 7fab26988540 2 rocksdb:
[db/column_family.cc:546] Failed to register data paths of column
family (id: 11, name: P) ```
Has anyone seen this behaviour before?
Or could tell me where I should look next to troubleshoot this (which logs)?
Any help is appreciated.
Best Wishes,
Mathias
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
email to ceph-users-leave@xxxxxxx