Dear David,
I assimilated most of my Ceph configuration into the cluster itself as this feature was announced by Mimic.
I see some fsid in [global] section of /etc/ceph/ceph.conf , and some key in [client.admin] section of /etc/ceph/ceph.client.admin.keyring
The rest is pretty uninteresting, some minimal adjustments in config file and cluster's config dump.
Looking into Python scripts of ceph-volume, I noticed that tmpfs is mounted during the run "ceph-colume lvm activate",
and "ceph-bluestore-tool prime-osd-dir" is started from the same script afterwards.
Should I try starting "ceph-volume lvm activate" in some manual way to see where it stumbles and why ?
David Caro wrote:
Hi Wladim,
If the "unable to find keyring" message disappeared, what was the error after that fix?
If it's still failing to fetch the mon config, check your authentication (you might have to add the osd key to the keyring again), and/or that the mons ips are correct in your osd ceph.conf file.
On 23 October 2020 16:08:02 CEST, Wladimir Mutel <mwg@xxxxxxxxx> wrote:
Dear all,
after breaking my experimental 1-host Ceph cluster and making one its
pg 'incomplete' I left it in abandoned state for some time.
Now I decided to bring it back into life and found that it can not
start one of its OSDs (osd.1 to name it)
"ceph osd df" shows :
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META
AVAIL %USE VAR PGS STATUS
0 hdd 0 1.00000 2.7 TiB 1.6 TiB 1.6 TiB 113 MiB 4.7
GiB 1.1 TiB 59.77 0.69 102 up
1 hdd 2.84549 0 0 B 0 B 0 B 0 B 0
B 0 B 0 0 0 down
2 hdd 2.84549 1.00000 2.8 TiB 2.6 TiB 2.5 TiB 57 MiB 3.8
GiB 275 GiB 90.58 1.05 176 up
3 hdd 2.84549 1.00000 2.8 TiB 2.6 TiB 2.5 TiB 57 MiB 3.9
GiB 271 GiB 90.69 1.05 185 up
4 hdd 2.84549 1.00000 2.8 TiB 2.6 TiB 2.5 TiB 63 MiB 4.2
GiB 263 GiB 90.98 1.05 184 up
5 hdd 2.84549 1.00000 2.8 TiB 2.6 TiB 2.5 TiB 52 MiB 3.8
GiB 263 GiB 90.96 1.05 178 up
6 hdd 2.53400 1.00000 2.5 TiB 2.3 TiB 2.3 TiB 173 MiB 5.2
GiB 228 GiB 91.21 1.05 178 up
7 hdd 2.53400 1.00000 2.5 TiB 2.3 TiB 2.3 TiB 147 MiB 5.2
GiB 230 GiB 91.12 1.05 168 up
TOTAL 19 TiB 17 TiB 16 TiB 662 MiB 31 GiB 2.6 TiB 86.48
MIN/MAX VAR: 0.69/1.05 STDDEV: 10.90
"ceph device ls" shows :
DEVICE HOST:DEV DAEMONS
LIFE EXPECTANCY
GIGABYTE_GP-ASACNE2100TTTDR_SN191108950380 p10s:nvme0n1 osd.1 osd.2
osd.3 osd.4 osd.5
WDC_WD30EFRX-68N32N0_WD-WCC7K1JJXVST p10s:sdd osd.1
WDC_WD30EFRX-68N32N0_WD-WCC7K1VUYPRA p10s:sda osd.6
WDC_WD30EFRX-68N32N0_WD-WCC7K2CKX8NT p10s:sdb osd.7
WDC_WD30EFRX-68N32N0_WD-WCC7K2UD8H74 p10s:sde osd.2
WDC_WD30EFRX-68N32N0_WD-WCC7K2VFTR1F p10s:sdh osd.5
WDC_WD30EFRX-68N32N0_WD-WCC7K3CYKL87 p10s:sdf osd.3
WDC_WD30EFRX-68N32N0_WD-WCC7K6FPZAJP p10s:sdc osd.0
WDC_WD30EFRX-68N32N0_WD-WCC7K7FXSCRN p10s:sdg osd.4
In my last migration, I created a bluestore volume with external
block.db like this :
"ceph-volume lvm prepare --bluestore --data /dev/sdd1 --block.db
/dev/nvme0n1p4"
And I can see this metadata by
"ceph-bluestore-tool show-label --dev
/dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202"
:
{
"/dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202":
{
"osd_uuid": "8c6324a3-0364-4fad-9dcb-81a1661ee202",
"size": 3000588304384,
"btime": "2020-07-12T11:34:16.579735+0300",
"description": "main",
"bfm_blocks": "45785344",
"bfm_blocks_per_key": "128",
"bfm_bytes_per_block": "65536",
"bfm_size": "3000588304384",
"bluefs": "1",
"ceph_fsid": "49cdfe90-6f6e-4afe-8558-bf14a13aadfa",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"osd_key": "AQD9ygpf+7+MABAAqtj4y1YYgxwCaAN/jgDSwg==",
"ready": "ready",
"require_osd_release": "14",
"whoami": "1"
}
}
and by
"ceph-bluestore-tool show-label --dev /dev/nvme0n1p4" :
{
"/dev/nvme0n1p4": {
"osd_uuid": "8c6324a3-0364-4fad-9dcb-81a1661ee202",
"size": 128025886720,
"btime": "2020-07-12T11:34:16.592054+0300",
"description": "bluefs db"
}
}
As you see, their osd_uuid is equal.
But when I try to start it by hand : "systemctl restart ceph-osd@1" ,
I get this in the logs : ("journalctl -b -u ceph-osd@1")
-- Logs begin at Tue 2020-10-13 19:09:49 EEST, end at Fri 2020-10-23
16:59:38 EEST. --
жов 23 16:59:36 p10s systemd[1]: Starting Ceph object storage daemon
osd.1...
жов 23 16:59:36 p10s systemd[1]: Started Ceph object storage daemon
osd.1.
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
7f513cebedc0 -1 auth: unable to find a keyring on
/var/lib/ceph/osd/ceph-1/keyring: (2) No
such file or directory
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
7f513cebedc0 -1 auth: unable to find a keyring on
/var/lib/ceph/osd/ceph-1/keyring: (2) No
such file or directory
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
7f513cebedc0 -1 AuthRegistry(0x560776222940) no keyring found at
/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.943+0300
7f513cebedc0 -1 AuthRegistry(0x560776222940) no keyring found at
/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
7f513cebedc0 -1 auth: unable to find a keyring on
/var/lib/ceph/osd/ceph-1/keyring: (2) No
such file or directory
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
7f513cebedc0 -1 auth: unable to find a keyring on
/var/lib/ceph/osd/ceph-1/keyring: (2) No
such file or directory
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at
/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
жов 23 16:59:36 p10s ceph-osd[3987]: 2020-10-23T16:59:36.947+0300
7f513cebedc0 -1 AuthRegistry(0x7fff46ea5d80) no keyring found at
/var/lib/ceph/osd/ceph-1/keyring, disabling cephx
жов 23 16:59:36 p10s ceph-osd[3987]: failed to fetch mon config
(--no-mon-config to skip)
жов 23 16:59:36 p10s systemd[1]: ceph-osd@1.service: Main process
exited, code=exited, status=1/FAILURE
жов 23 16:59:36 p10s systemd[1]: ceph-osd@1.service: Failed with result
'exit-code'.
And so my question is, how to make this OSD known again to Ceph cluster
without recreating it anew with ceph-volume ?
I see that every folder under "/var/lib/ceph/osd/" is a tmpfs mount
point filled with appropriate files and symlinks, except of
"/var/lib/ceph/osd/ceph-1",
which is just an empty folder not mounted anywhere.
I tried to run
"ceph-bluestore-tool prime-osd-dir --dev
/dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9dcb-81a1661ee202
--path
/var/lib/ceph/osd/ceph-1"
it created some files under /var/lib/ceph/osd/ceph-1 but without tmpfs
mount, and these files belonged to root. I changed owner of these files
into ceph.ceph ,
I created appropriate symlinks for block and block.db but ceph-osd@1
did not want to start either. Only "unable to find keyring" messages
disappeared.
Please give any help on where to move next.
Thanks in advance for your help.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx