Hi Wladimir, according to the logs you first sent it seems that there is an authentication issue (the osd daemon not being able to fetch the mon config): > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300 > 7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at > /var/lib/ceph/osd/ceph-1/keyring, disabling cephx > жов 23 16:59:36 p10s ceph-osd[3987]: failed to fetch mon config > (--no-mon-config to skip) > жов 23 16:59:36 p10s systemd[1]: ceph-osd@1.service: Main process > exited, code=exited, status=1/FAILURE The file it fails to load the keyring from is where the auth details for the osd daemon should be in. Some more info here: https://docs.ceph.com/en/latest/man/8/ceph-authtool/ https://docs.ceph.com/en/latest/rados/configuration/auth-config-ref/ https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/ (specifically step 5) I'm not sure if you were able to fix it or not, but I'd start trying to get that fixed before playing with ceph-volume. On 10/27 10:24, Wladimir Mutel wrote: > Dear David, > > I assimilated most of my Ceph configuration into the cluster itself as this feature was announced by Mimic. > I see some fsid in [global] section of /etc/ceph/ceph.conf , and some key in [client.admin] section of /etc/ceph/ceph.client.admin.keyring > The rest is pretty uninteresting, some minimal adjustments in config file and cluster's config dump. > > Looking into Python scripts of ceph-volume, I noticed that tmpfs is mounted during the run "ceph-colume lvm activate", > and "ceph-bluestore-tool prime-osd-dir" is started from the same script afterwards. > Should I try starting "ceph-volume lvm activate" in some manual way to see where it stumbles and why ? > > David Caro wrote: > > Hi Wladim, > > > > If the "unable to find keyring" message disappeared, what was the error after that fix? > > > > If it's still failing to fetch the mon config, check your authentication (you might have to add the osd key to the keyring again), and/or that the mons ips are correct in your osd ceph.conf file. > > > > On 23 October 2020 16:08:02 CEST, Wladimir Mutel <mwg@xxxxxxxxx> wrote: > > > Dear all, > > > > > > after breaking my experimental 1-host Ceph cluster and making one its > > > pg 'incomplete' I left it in abandoned state for some time. > > > Now I decided to bring it back into life and found that it can not > > > start one of its OSDs (osd.1 to name it) > > > > > > "ceph osd df" shows : > > > > > > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META > > > AVAIL %USE VAR PGS STATUS > > > 0 hdd 0 1.00000 2.7 TiB 1.6 TiB 1.6 TiB 113 MiB 4.7 > > > GiB 1.1 TiB 59.77 0.69 102 up > > > 1 hdd 2.84549 0 0 B 0 B 0 B 0 B 0 > > > B 0 B 0 0 0 down > > > 2 hdd 2.84549 1.00000 2.8 TiB 2.6 TiB 2.5 TiB 57 MiB 3.8 > > > GiB 275 GiB 90.58 1.05 176 up > > > 3 hdd 2.84549 1.00000 2.8 TiB 2.6 TiB 2.5 TiB 57 MiB 3.9 > > > GiB 271 GiB 90.69 1.05 185 up > > > 4 hdd 2.84549 1.00000 2.8 TiB 2.6 TiB 2.5 TiB 63 MiB 4.2 > > > GiB 263 GiB 90.98 1.05 184 up > > > 5 hdd 2.84549 1.00000 2.8 TiB 2.6 TiB 2.5 TiB 52 MiB 3.8 > > > GiB 263 GiB 90.96 1.05 178 up > > > 6 hdd 2.53400 1.00000 2.5 TiB 2.3 TiB 2.3 TiB 173 MiB 5.2 > > > GiB 228 GiB 91.21 1.05 178 up > > > 7 hdd 2.53400 1.00000 2.5 TiB 2.3 TiB 2.3 TiB 147 MiB 5.2 > > > GiB 230 GiB 91.12 1.05 168 up > > > TOTAL 19 TiB 17 TiB 16 TiB 662 MiB 31 GiB 2.6 TiB 86.48 > > > MIN/MAX VAR: 0.69/1.05 STDDEV: 10.90 > > > > > > "ceph device ls" shows : > > > > > > DEVICE HOST:DEV DAEMONS > > > LIFE EXPECTANCY > > > GIGABYTE_GP-ASACNE2100TTTDR_SN191108950380 p10s:nvme0n1 osd.1 osd.2 > > > osd.3 osd.4 osd.5 > > > WDC_WD30EFRX-68N32N0_WD-WCC7K1JJXVST p10s:sdd osd.1 > > > WDC_WD30EFRX-68N32N0_WD-WCC7K1VUYPRA p10s:sda osd.6 > > > WDC_WD30EFRX-68N32N0_WD-WCC7K2CKX8NT p10s:sdb osd.7 > > > WDC_WD30EFRX-68N32N0_WD-WCC7K2UD8H74 p10s:sde osd.2 > > > WDC_WD30EFRX-68N32N0_WD-WCC7K2VFTR1F p10s:sdh osd.5 > > > WDC_WD30EFRX-68N32N0_WD-WCC7K3CYKL87 p10s:sdf osd.3 > > > WDC_WD30EFRX-68N32N0_WD-WCC7K6FPZAJP p10s:sdc osd.0 > > > WDC_WD30EFRX-68N32N0_WD-WCC7K7FXSCRN p10s:sdg osd.4 > > > > > > In my last migration, I created a bluestore volume with external > > > block.db like this : > > > > > > "ceph-volume lvm prepare --bluestore --data /dev/sdd1 --block.db > > > /dev/nvme0n1p4" > > > > > > And I can see this metadata by > > > > > > "ceph-bluestore-tool show-label --dev > > > /dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202" > > > : > > > > > > { > > > "/dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202": > > > { > > > "osd_uuid": "8c6324a3-0364-4fad-9dcb-81a1661ee202", > > > "size": 3000588304384, > > > "btime": "2020-07-12T11:34:16.579735+0300", > > > "description": "main", > > > "bfm_blocks": "45785344", > > > "bfm_blocks_per_key": "128", > > > "bfm_bytes_per_block": "65536", > > > "bfm_size": "3000588304384", > > > "bluefs": "1", > > > "ceph_fsid": "49cdfe90-6f6e-4afe-8558-bf14a13aadfa", > > > "kv_backend": "rocksdb", > > > "magic": "ceph osd volume v026", > > > "mkfs_done": "yes", > > > "osd_key": "AQD9ygpf+7+MABAAqtj4y1YYgxwCaAN/jgDSwg==", > > > "ready": "ready", > > > "require_osd_release": "14", > > > "whoami": "1" > > > } > > > } > > > > > > and by > > > > > > "ceph-bluestore-tool show-label --dev /dev/nvme0n1p4" : > > > > > > { > > > "/dev/nvme0n1p4": { > > > "osd_uuid": "8c6324a3-0364-4fad-9dcb-81a1661ee202", > > > "size": 128025886720, > > > "btime": "2020-07-12T11:34:16.592054+0300", > > > "description": "bluefs db" > > > } > > > } > > > > > > As you see, their osd_uuid is equal. > > > But when I try to start it by hand : "systemctl restart ceph-osd@1" , > > > I get this in the logs : ("journalctl -b -u ceph-osd@1") > > > > > > -- Logs begin at Tue 2020-10-13 19:09:49 EEST, end at Fri 2020-10-23 > > > 16:59:38 EEST. -- > > > жов 23 16:59:36 p10s systemd[1]: Starting Ceph object storage daemon > > > osd.1... > > > жов 23 16:59:36 p10s systemd[1]: Started Ceph object storage daemon > > > osd.1. > > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300 > > > 7f513cebedc0 -1 auth: unable to find a keyring on > > > /var/lib/ceph/osd/ceph-1/keyring: (2) No > > > such file or directory > > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300 > > > 7f513cebedc0 -1 auth: unable to find a keyring on > > > /var/lib/ceph/osd/ceph-1/keyring: (2) No > > > such file or directory > > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300 > > > 7f513cebedc0 -1 AuthRegistry(0x560776222940) no keyring found at > > > /var/lib/ceph/osd/ceph-1/keyring, disabling cephx > > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300 > > > 7f513cebedc0 -1 AuthRegistry(0x560776222940) no keyring found at > > > /var/lib/ceph/osd/ceph-1/keyring, disabling cephx > > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300 > > > 7f513cebedc0 -1 auth: unable to find a keyring on > > > /var/lib/ceph/osd/ceph-1/keyring: (2) No > > > such file or directory > > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300 > > > 7f513cebedc0 -1 auth: unable to find a keyring on > > > /var/lib/ceph/osd/ceph-1/keyring: (2) No > > > such file or directory > > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300 > > > 7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at > > > /var/lib/ceph/osd/ceph-1/keyring, disabling cephx > > > жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300 > > > 7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at > > > /var/lib/ceph/osd/ceph-1/keyring, disabling cephx > > > жов 23 16:59:36 p10s ceph-osd[3987]: failed to fetch mon config > > > (--no-mon-config to skip) > > > жов 23 16:59:36 p10s systemd[1]: ceph-osd@1.service: Main process > > > exited, code=exited, status=1/FAILURE > > > жов 23 16:59:36 p10s systemd[1]: ceph-osd@1.service: Failed with result > > > 'exit-code'. > > > > > > And so my question is, how to make this OSD known again to Ceph cluster > > > without recreating it anew with ceph-volume ? > > > I see that every folder under "/var/lib/ceph/osd/" is a tmpfs mount > > > point filled with appropriate files and symlinks, except of > > > "/var/lib/ceph/osd/ceph-1", > > > which is just an empty folder not mounted anywhere. > > > I tried to run > > > > > > "ceph-bluestore-tool prime-osd-dir --dev > > > /dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202 > > > --path > > > /var/lib/ceph/osd/ceph-1" > > > > > > it created some files under /var/lib/ceph/osd/ceph-1 but without tmpfs > > > mount, and these files belonged to root. I changed owner of these files > > > into ceph.ceph , > > > I created appropriate symlinks for block and block.db but ceph-osd@1 > > > did not want to start either. Only "unable to find keyring" messages > > > disappeared. > > > > > > Please give any help on where to move next. > > > Thanks in advance for your help. > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- David Caro _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx