Hi Eugen, the test-test cluster where we started with simple ceph and the adoption when straight forward are working fine. But this test cluster was all over the place. We had an old running update via orchestrator which was still in the pipeline, the adoption process was stopped a year ago and now got picked up again, and so on and so forth. But now we have it clean, at least we think it's clean. After a reboot, the services are not available. I have to start the via ceph orch. root@0cc47a6df14e:~# systemctl list-units | grep ceph ceph-crash.service loaded active running Ceph crash dump collector ceph-fuse.target loaded active active ceph target allowing to start/stop all ceph-fuse@.service instances at once ceph-mds.target loaded active active ceph target allowing to start/stop all ceph-mds@.service instances at once ceph-mgr.target loaded active active ceph target allowing to start/stop all ceph-mgr@.service instances at once ceph-mon.target loaded active active ceph target allowing to start/stop all ceph-mon@.service instances at once ceph-osd.target loaded active active ceph target allowing to start/stop all ceph-osd@.service instances at once ceph-radosgw.target loaded active active ceph target allowing to start/stop all ceph-radosgw@.service instances at once ceph.target loaded active active All Ceph clusters and services root@0cc47a6df14e:~# ceph orch start mgr Scheduled to start mgr.0cc47a6df14e.nvjlcx on host '0cc47a6df14e' Scheduled to start mgr.0cc47a6df330.aznjao on host '0cc47a6df330' Scheduled to start mgr.0cc47aad8ce8.ifiydp on host '0cc47aad8ce8' root@0cc47a6df14e:~# ceph orch start mon Scheduled to start mon.0cc47a6df14e on host '0cc47a6df14e' Scheduled to start mon.0cc47a6df330 on host '0cc47a6df330' Scheduled to start mon.0cc47aad8ce8 on host '0cc47aad8ce8' root@0cc47a6df14e:~# ceph orch start osd.all-flash-over-1tb Scheduled to start osd.2 on host '0cc47a6df14e' Scheduled to start osd.5 on host '0cc47a6df14e' Scheduled to start osd.3 on host '0cc47a6df330' Scheduled to start osd.0 on host '0cc47a6df330' Scheduled to start osd.4 on host '0cc47aad8ce8' Scheduled to start osd.1 on host '0cc47aad8ce8' root@0cc47a6df14e:~# systemctl list-units | grep ceph ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853@mgr.0cc47a6df14e.nvjlcx.service loaded active running Ceph mgr.0cc47a6df14e.nvjlcx for 03977a23-f00f-4bb0-b9a7-de57f40ba853 ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853@mon.0cc47a6df14e.service loaded active running Ceph mon.0cc47a6df14e for 03977a23-f00f-4bb0-b9a7-de57f40ba853 ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853@osd.2.service loaded active running Ceph osd.2 for 03977a23-f00f-4bb0-b9a7-de57f40ba853 ceph-crash.service loaded active running Ceph crash dump collector system-ceph\x2d03977a23\x2df00f\x2d4bb0\x2db9a7\x2dde57f40ba853.slice loaded active active system-ceph\x2d03977a23\x2df00f\x2d4bb0\x2db9a7\x2dde57f40ba853.slice ceph-fuse.target loaded active active ceph target allowing to start/stop all ceph-fuse@.service instances at once ceph-mds.target loaded active active ceph target allowing to start/stop all ceph-mds@.service instances at once ceph-mgr.target loaded active active ceph target allowing to start/stop all ceph-mgr@.service instances at once ceph-mon.target loaded active active ceph target allowing to start/stop all ceph-mon@.service instances at once ceph-osd.target loaded active active ceph target allowing to start/stop all ceph-osd@.service instances at once ceph-radosgw.target loaded active active ceph target allowing to start/stop all ceph-radosgw@.service instances at once ceph.target loaded active active All Ceph clusters and services root@0cc47a6df14e:~# systemctl status ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853@mgr.0cc47a6df14e.nvjlcx.service ● ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853@mgr.0cc47a6df14e.nvjlcx.service - Ceph mgr.0cc47a6df14e.nvjlcx for 03977a23-f00f-4bb0-b9a7-de57f40ba853 Loaded: loaded (/etc/systemd/system/ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853@.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2023-09-16 09:18:53 UTC; 51s ago Process: 4828 ExecStartPre=/bin/rm -f /run/ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853@mgr.0cc47a6df14e.nvjlcx.service-pid /run/ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853@mgr.0cc47a6df1> Process: 4829 ExecStart=/bin/bash /var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/mgr.0cc47a6df14e.nvjlcx/unit.run (code=exited, status=0/SUCCESS) Main PID: 5132 (conmon) Tasks: 36 (limit: 309227) Memory: 512.0M CGroup: /system.slice/system-ceph\x2d03977a23\x2df00f\x2d4bb0\x2db9a7\x2dde57f40ba853.slice/ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853@mgr.0cc47a6df14e.nvjlcx.service ├─container │ ├─5136 /dev/init -- /usr/bin/ceph-mgr -n mgr.0cc47a6df14e.nvjlcx -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true --default-log> │ └─5139 /usr/bin/ceph-mgr -n mgr.0cc47a6df14e.nvjlcx -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true --default-log-to-stderr=fa> └─supervisor └─5132 /usr/libexec/podman/conmon --api-version 1 -c 0165b4f78867ad284cc65fbece46013e6547a2f3ecf99cc7ffb8b720f705ee66 -u 0165b4f78867ad284cc65fbece46013e6547a2f3ecf99cc7ff> Sep 16 09:19:04 0cc47a6df14e.f00f.gridscale.dev ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853-mgr-0cc47a6df14e-nvjlcx[5132]: 2023-09-16T09:19:04.333+0000 7f4fcc0a91c0 -1 mgr[py] Module alert> Sep 16 09:19:04 0cc47a6df14e.f00f.gridscale.dev ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853-mgr-0cc47a6df14e-nvjlcx[5132]: 2023-09-16T09:19:04.501+0000 7f4fcc0a91c0 -1 mgr[py] Module iosta> Sep 16 09:19:05 0cc47a6df14e.f00f.gridscale.dev ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853-mgr-0cc47a6df14e-nvjlcx[5132]: 2023-09-16T09:19:05.249+0000 7f4fcc0a91c0 -1 mgr[py] Module orche> Sep 16 09:19:05 0cc47a6df14e.f00f.gridscale.dev ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853-mgr-0cc47a6df14e-nvjlcx[5132]: 2023-09-16T09:19:05.481+0000 7f4fcc0a91c0 -1 mgr[py] Module rbd_s> Sep 16 09:19:06 0cc47a6df14e.f00f.gridscale.dev ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853-mgr-0cc47a6df14e-nvjlcx[5132]: [16/Sep/2023:09:19:06] ENGINE Bus STARTING Sep 16 09:19:06 0cc47a6df14e.f00f.gridscale.dev ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853-mgr-0cc47a6df14e-nvjlcx[5132]: CherryPy Checker: Sep 16 09:19:06 0cc47a6df14e.f00f.gridscale.dev ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853-mgr-0cc47a6df14e-nvjlcx[5132]: The Application mounted at '' has an empty config. Sep 16 09:19:06 0cc47a6df14e.f00f.gridscale.dev ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853-mgr-0cc47a6df14e-nvjlcx[5132]: Sep 16 09:19:06 0cc47a6df14e.f00f.gridscale.dev ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853-mgr-0cc47a6df14e-nvjlcx[5132]: [16/Sep/2023:09:19:06] ENGINE Serving on http://:::9283 Sep 16 09:19:06 0cc47a6df14e.f00f.gridscale.dev ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853-mgr-0cc47a6df14e-nvjlcx[5132]: [16/Sep/2023:09:19:06] ENGINE Bus STARTED This seems to be the cephadm log: cephadm ['adopt', '--style', 'legacy', '--name', 'osd.3'] 2023-09-15 11:32:44,290 7fef7b041740 INFO Pulling container image quay.io/ceph/ceph:v17... 2023-09-15 11:32:47,128 7fef7b041740 INFO Found online OSD at //var/lib/ceph/osd/ceph-3/fsid 2023-09-15 11:32:47,129 7fef7b041740 INFO objectstore_type is bluestore 2023-09-15 11:32:47,150 7fef7b041740 INFO Stopping old systemd unit ceph-osd@3... 2023-09-15 11:32:48,560 7fef7b041740 INFO Disabling old systemd unit ceph-osd@3... 2023-09-15 11:32:49,157 7fef7b041740 INFO Moving data... 2023-09-15 11:32:49,158 7fef7b041740 DEBUG move file '//var/lib/ceph/osd/ceph-3/require_osd_release' -> '/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/require_osd_release' 2023-09-15 11:32:49,158 7fef7b041740 DEBUG chown 167:167 `/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/require_osd_release` 2023-09-15 11:32:49,158 7fef7b041740 DEBUG symlink '/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/block' -> '/dev/ceph-66d3bb27-cd5c-4897-aa76-684bc46d1c8b/osd-block-4bfc2101-e9b2-468d-8f54-a05f080ebdfe' 2023-09-15 11:32:49,158 7fef7b041740 DEBUG move file '//var/lib/ceph/osd/ceph-3/ready' -> '/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/ready' 2023-09-15 11:32:49,159 7fef7b041740 DEBUG chown 167:167 `/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/ready` 2023-09-15 11:32:49,159 7fef7b041740 DEBUG move file '//var/lib/ceph/osd/ceph-3/type' -> '/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/type' 2023-09-15 11:32:49,159 7fef7b041740 DEBUG chown 167:167 `/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/type` 2023-09-15 11:32:49,159 7fef7b041740 DEBUG move file '//var/lib/ceph/osd/ceph-3/fsid' -> '/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/fsid' 2023-09-15 11:32:49,159 7fef7b041740 DEBUG chown 167:167 `/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/fsid` 2023-09-15 11:32:49,160 7fef7b041740 DEBUG move file '//var/lib/ceph/osd/ceph-3/ceph_fsid' -> '/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/ceph_fsid' 2023-09-15 11:32:49,160 7fef7b041740 DEBUG chown 167:167 `/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/ceph_fsid` 2023-09-15 11:32:49,160 7fef7b041740 DEBUG move file '//var/lib/ceph/osd/ceph-3/keyring' -> '/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/keyring' 2023-09-15 11:32:49,160 7fef7b041740 DEBUG chown 167:167 `/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/keyring` 2023-09-15 11:32:49,160 7fef7b041740 DEBUG move file '//var/lib/ceph/osd/ceph-3/whoami' -> '/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/whoami' 2023-09-15 11:32:49,161 7fef7b041740 DEBUG chown 167:167 `/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/whoami` 2023-09-15 11:32:49,161 7fef7b041740 DEBUG Remove dir `//var/lib/ceph/osd/ceph-3` 2023-09-15 11:32:49,166 7fef7b041740 INFO Chowning content... 2023-09-15 11:32:49,171 7fef7b041740 DEBUG chown: stdout changed ownership of '/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/block' from root:root to 167:167 2023-09-15 11:32:49,172 7fef7b041740 INFO Chowning /var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/block... 2023-09-15 11:32:49,172 7fef7b041740 INFO Disabling host unit ceph-volume@ lvm unit... 2023-09-15 11:32:49,649 7fef7b041740 DEBUG systemctl: stderr Removed /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-3-4bfc2101-e9b2-468d-8f54-a05f080ebdfe.service. 2023-09-15 11:32:49,650 7fef7b041740 DEBUG copy file `//etc/ceph/ceph.conf` -> `/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/config` 2023-09-15 11:32:49,650 7fef7b041740 DEBUG chown 167:167 `/var/lib/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/osd.3/config` 2023-09-15 11:32:49,650 7fef7b041740 INFO Moving logs... 2023-09-15 11:32:49,651 7fef7b041740 DEBUG move file '//var/log/ceph/ceph-osd.3.log' -> '/var/log/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/ceph-osd.3.log' 2023-09-15 11:32:49,651 7fef7b041740 DEBUG chown 167:167 `/var/log/ceph/03977a23-f00f-4bb0-b9a7-de57f40ba853/ceph-osd.3.log` 2023-09-15 11:32:49,651 7fef7b041740 INFO Creating new units... 2023-09-15 11:32:50,803 7fef7b041740 DEBUG sysctl: stdout * Applying /etc/sysctl.d/10-console-messages.conf ... 2023-09-15 11:32:50,803 7fef7b041740 DEBUG sysctl: stdout kernel.printk = 4 4 1 7 2023-09-15 11:32:50,803 7fef7b041740 DEBUG sysctl: stdout * Applying /etc/sysctl.d/10-ipv6-privacy.conf ... 2023-09-15 11:32:50,803 7fef7b041740 DEBUG sysctl: stdout net.ipv6.conf.all.use_tempaddr = 2 2023-09-15 11:32:50,803 7fef7b041740 DEBUG sysctl: stdout net.ipv6.conf.default.use_tempaddr = 2 2023-09-15 11:32:50,803 7fef7b041740 DEBUG sysctl: stdout * Applying /etc/sysctl.d/10-kernel-hardening.conf ... 2023-09-15 11:32:50,803 7fef7b041740 DEBUG sysctl: stdout kernel.kptr_restrict = 1 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout * Applying /etc/sysctl.d/10-link-restrictions.conf ... 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout fs.protected_hardlinks = 1 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout fs.protected_symlinks = 1 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout * Applying /etc/sysctl.d/10-magic-sysrq.conf ... 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout kernel.sysrq = 176 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout * Applying /etc/sysctl.d/10-network-security.conf ... 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout net.ipv4.conf.default.rp_filter = 2 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout net.ipv4.conf.all.rp_filter = 2 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout * Applying /etc/sysctl.d/10-ptrace.conf ... 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout kernel.yama.ptrace_scope = 1 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout * Applying /etc/sysctl.d/10-zeropage.conf ... 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout vm.mmap_min_addr = 65536 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout * Applying /etc/sysctl.d/30-ceph-osd.conf ... 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout fs.aio-max-nr = 1048576 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout kernel.pid_max = 4194304 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout * Applying /usr/lib/sysctl.d/50-coredump.conf ... 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout kernel.core_pattern = |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout * Applying /usr/lib/sysctl.d/50-default.conf ... 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout net.ipv4.conf.default.promote_secondaries = 1 2023-09-15 11:32:50,804 7fef7b041740 DEBUG sysctl: stdout net.ipv4.ping_group_range = 0 2147483647 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stdout net.core.default_qdisc = fq_codel 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stdout fs.protected_regular = 1 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stdout fs.protected_fifos = 1 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stdout * Applying /usr/lib/sysctl.d/50-pid-max.conf ... 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stdout kernel.pid_max = 4194304 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stdout * Applying /etc/sysctl.d/90-ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853-osd.conf ... 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stdout fs.aio-max-nr = 1048576 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stdout kernel.pid_max = 4194304 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stdout * Applying /etc/sysctl.d/99-sysctl.conf ... 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stdout * Applying /usr/lib/sysctl.d/protect-links.conf ... 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stdout fs.protected_fifos = 1 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stdout fs.protected_hardlinks = 1 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stdout fs.protected_regular = 2 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stdout fs.protected_symlinks = 1 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stdout * Applying /etc/sysctl.conf ... 2023-09-15 11:32:50,805 7fef7b041740 DEBUG sysctl: stderr sysctl: setting key "net.ipv4.conf.all.promote_secondaries": Invalid argument 2023-09-15 11:32:51,469 7fef7b041740 DEBUG Non-zero exit code 1 from systemctl reset-failed ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853@osd.3 2023-09-15 11:32:51,469 7fef7b041740 DEBUG systemctl: stderr Failed to reset failed state of unit ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853@osd.3.service: Unit ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853@osd.3.service not loaded. 2023-09-15 11:32:51,954 7fef7b041740 DEBUG systemctl: stderr Created symlink /etc/systemd/system/ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853.target.wants/ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853@osd.3.service → /etc/systemd/system/ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853@.service. 2023-09-15 11:32:54,331 7fef7b041740 DEBUG firewalld does not appear to be present Am Sa., 16. Sept. 2023 um 10:25 Uhr schrieb Eugen Block <eblock@xxxxxx>: > That sounds a bit strange to me, because all clusters we adopted so > far successfully converted the previous systemd-units into systemd > units targeting the pods. This process also should have been logged > (stdout, probably in the cephadm.log as well), resulting in "enabled" > systemd units. Can you paste the output of 'systemctl status > ceph-<FSID>@mon.<MON>'? If you have it, please also share the logs > from the adoption process. > What I did notice in a test cluster a while ago was that I had to > reboot a node where I had to "play around" a bit with removed and > redeployed osd containers. At some point they didn't react to > systemctl commands anymore, but a reboot fixed that. But I haven't > seen that in a production cluster yet, so some more details would be > useful. > > Zitat von Boris Behrens <bb@xxxxxxxxx>: > > > Hi, > > is there a way to have the pods start again after reboot? > > Currently I need to start them by hand via ceph orch start > mon/mgr/osd/... > > > > I imagine this will lead to a lot of headache when the ceph cluster gets > a > > powercycle and the mon pods will not start automatically. > > > > I've spun up a test cluster and there the pods start very fast. On the > > legacy test cluster, which got adopted to cephadm, it does not. > > > > Cheers > > Boris > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groüen Saal. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx