Hello all! Linked stackoverflow post: https://stackoverflow.com/questions/75101087/cephadm-ceph-osd-fails-to-start-after-reboot-of-host <https://stackoverflow.com/questions/75101087/cephadm-ceph-osd-fails-to-start-after-reboot-of-host> A couple of weeks ago I deployed a new Ceph cluster using Cephadm. It is a three node cluster (node1, node2, & node3) with 6 OSD’s each; 6x18TB Seagate hard drives with a 2TB NVMe drive set as a DB device. Everything has been running smoothly until today when I went to perform maintenance on one of the nodes. I first moved all of the services off the host and put it into maintenance mode. I then made some changes to once of the NIC’s and ran updates. After the updates were done, I rebooted the machine. This is when the issue occurred. When the node (node1) finished rebooting, it was still showing as offline in the Ceph Dashboard so from one of the host I ran `ceph orch host rescan node1` and it came back online in the Ceph dashboard. I’ve seen this before when I’ve had to reboot host so NBD so far. However, after a couple of minutes passed the OSD’s on that host still haven’t come online. I then checked the status of the services `systemctl | grep ceph` and saw that all of the OSD’s had failed. # systemctl status ceph-0a7ec2ae-816d-11ed-9791-97c1d8fb9dc6@osd.0.service × ceph-0a7ec2ae-816d-11ed-9791-97c1d8fb9dc6@osd.0.service - Ceph osd.0 for 0a7ec2ae-816d-11ed-9791-97c1d8fb9dc6 Loaded: loaded (/etc/systemd/system/ceph-0a7ec2ae-816d-11ed-9791-97c1d8fb9dc6@.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Thu 2023-01-12 18:14:27 UTC; 1h 42min ago Main PID: 385982 (code=exited, status=1/FAILURE) CPU: 292ms Jan 12 19:48:30 node1 systemd[1]: /etc/systemd/system/ceph-0a7ec2ae-816d-11ed-9791-97c1d8fb9dc6@.service:24: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer Kill It was at the reset counter max so I had to run `systemctl reset-failed` and I tried restarting the OSD’s by running `systemctl restart ceph.target`. I watched the service try to load but it kept failing. This was the output of /var/log/ceph/<fsid>/ceph-osd.0.log: 2023-01-12T18:12:06.501+0000 7fb5d3b1e3c0 0 set uid:gid to 167:167 (ceph:ceph) 2023-01-12T18:12:06.501+0000 7fb5d3b1e3c0 0 ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable), process ceph-osd, pid 7 2023-01-12T18:12:06.501+0000 7fb5d3b1e3c0 0 pidfile_write: ignore empty --pid-file 2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f87400 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f87400 /var/lib/ceph/osd/ceph-0/block) open size 20000584761344 (0x1230bfc00000, 18 TiB) block_size 4096 (4 KiB) rotational discard not supported 2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bluestore(/var/lib/ceph/osd/ceph-0) _set_cache_sizes cache_size 1073741824 meta 0.45 kv 0.45 data 0.06 2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86c00 /var/lib/ceph/osd/ceph-0/block.db) open path /var/lib/ceph/osd/ceph-0/block.db 2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86c00 /var/lib/ceph/osd/ceph-0/block.db) open size 333396836352 (0x4da0000000, 310 GiB) block_size 4096 (4 KiB) non-rotational discard supported 2023-01-12T18:12:06.505+0000 7fb5d3b1e3c0 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-0/block.db size 310 GiB 2023-01-12T18:12:06.513+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86800 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:06.513+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86800 /var/lib/ceph/osd/ceph-0/block) open size 20000584761344 (0x1230bfc00000, 18 TiB) block_size 4096 (4 KiB) rotational discard not supported 2023-01-12T18:12:06.513+0000 7fb5d3b1e3c0 1 bluefs add_block_device bdev 2 path /var/lib/ceph/osd/ceph-0/block size 18 TiB 2023-01-12T18:12:06.513+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86c00 /var/lib/ceph/osd/ceph-0/block.db) close 2023-01-12T18:12:06.817+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f86800 /var/lib/ceph/osd/ceph-0/block) close 2023-01-12T18:12:07.085+0000 7fb5d3b1e3c0 1 bdev(0x5591e1f87400 /var/lib/ceph/osd/ceph-0/block) close 2023-01-12T18:12:07.305+0000 7fb5d3b1e3c0 0 starting osd.0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 0 load: jerasure load: lrc 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 863.20 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0250000 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000052 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 0 osd.0:0.OSDShard using op scheduler mClockScheduler 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 863.20 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0250000 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000052 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 0 osd.0:1.OSDShard using op scheduler mClockScheduler 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:07.321+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 863.20 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0250000 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000052 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 0 osd.0:2.OSDShard using op scheduler mClockScheduler 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 863.20 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0250000 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000052 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 0 osd.0:3.OSDShard using op scheduler mClockScheduler 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_max_osd_capacity #op shards: 5 max osd capacity(iops) per shard: 863.20 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_io osd_mclock_cost_per_io: 0.0250000 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_osd_mclock_cost_per_byte osd_mclock_cost_per_byte: 0.0000052 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 mClockScheduler: set_mclock_profile mclock profile: high_client_ops 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 0 osd.0:4.OSDShard using op scheduler mClockScheduler 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (13) Permission denied 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (13) Permission denied 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 bdev(0x5591e2d8e000 /var/lib/ceph/osd/ceph-0/block) open open got: (13) Permission denied 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 osd.0 0 OSD:init: unable to mount object store 2023-01-12T18:12:07.325+0000 7fb5d3b1e3c0 -1 [0;31m ** ERROR: osd init failed: (13) Permission denied[0m Judging by the final error, it looked like some sort of permissions issue with mounting the volume to the container. I did notice on the other two host, node2 & node3, that I have not yet reboot since deploying Ceph with cephadm that it was more docker overlays mounted when I ran the `mount` command. My theory is that the LVM volume stored on the OSD’s is not being mounted at boot. Otherwise it might also be the case that the user that Ceph is passing to the containers is not allowed to mount the volumes for some reason. I’ve looked through most of the docs and forums I could find and haven’t found any solutions. I would like to say I’m fairly experienced with Linux 5+ years, but I am new to Ceph (~6 months) and I haven’t emailed this list before. Sorry in advance if I’ve mistakenly broken any roles and thanks for the help! - Ben M _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx