That doesn't sound right. I had a single node cluster deployed with
16.2.5 and tried to reproduce. I only installed cephadm and copied the
cephadm public key to the new node and added it to the cluster via
dashboard. Then I added some disks to it and they were successfully
deployed as OSDs. The second node doesn't have a local ceph.conf in
/etc/ceph:
pacific2:~ # ll /etc/ceph/
insgesamt 4
-rw-r--r-- 1 root root 92 23. Dez 2021 rbdmap
Could this be somehow related to ceph-ansible (which I don't use)?
Zitat von Brent Kennedy <bkennedy@xxxxxxxxxx>:
I didn’t have a chance to run this while things were down. I went
ahead and purged the OSDs again, then went to the OSD server and
deleted all the osd instance folders in /var/lib/ceph. Waited for
the cluster to clear out the daemons and then zapped the drives.
The one thing I did differently this time, was place a copy of the
ceph.conf in the /etc/ceph folder on the new server. I had not put
the conf file there because I thought cephadm doesn’t require local
configurations. But with the configuration file there this time,
the drives were automatically added after the zap and all of them
came up without issue. I never thought to put the file there
because the first cluster I spun up with cephadm ( not a conversion
though, it was a fresh install ), doesn’t have ceph.conf files or a
/etc/ceph directory on any of the servers. I was doing some
googlefu searching and found someone mention an OSD wouldn’t come up
without a conf file with cephadm, was showing a strange error, the
fix was to put the file in place.
I gather this is a legacy setting that I need to clean up and update
in the cephadm configuration? Is this documented as something to do
as part of the conversion and I missed it? I don’t see the point in
relying on local conf files if everything is being kept in the
cluster.
Thanks for looking at this guys! I will keep that command handy,
still learning the ins/outs of cephadm.
-Brent
From: Adam King <adking@xxxxxxxxxx>
Sent: Monday, October 17, 2022 2:25 PM
To: Brent Kennedy <bkennedy@xxxxxxxxxx>
Cc: Eugen Block <eblock@xxxxxx>; ceph-users@xxxxxxx
Subject: Re: Re: Cephadm - Adding host to migrated cluster
Do the journal logs for the OSDs say anything about why they
couldn't start up? ("cephadm ls --no-detail" run on the host will
give the systemd units for each daemon on the host so you can get
them easier).
On Mon, Oct 17, 2022 at 1:37 PM Brent Kennedy <bkennedy@xxxxxxxxxx
<mailto:bkennedy@xxxxxxxxxx> > wrote:
Below is what the ceph mgr log is saying as soon as I zap the disks and it
tries to add them. Note, the crash and node exporter containers were
started from the cluster when the node was added to the cluster( no issues
or manual involvement ).
0376a72700 0 log_channel(cephadm) log [INF] : Detected new or changed
devices on server6
2022-10-17T17:31:39.585+0000 7f036d35f700 0 log_channel(cluster) log [DBG]
: pgmap v1006971: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142
TiB / 178 TiB avail
2022-10-17T17:31:39.647+0000 7f036bb5c700 0 [progress WARNING root]
complete: ev 36e9a875-1bf6-4d2f-9440-0b875cc408a6 does not exist
2022-10-17T17:31:39.648+0000 7f036bb5c700 0 [progress WARNING root]
complete: ev c9b942c6-0a54-4a1d-9019-dfdaf4e6e36c does not exist
2022-10-17T17:31:39.648+0000 7f036bb5c700 0 [progress WARNING root]
complete: ev 18b2e448-e522-49c0-a817-5edafbcc3eb2 does not exist
2022-10-17T17:31:39.650+0000 7f036bb5c700 0 [progress WARNING root]
complete: ev 9e808b8e-174f-45ce-a898-75ad3c785fa8 does not exist
2022-10-17T17:31:39.651+0000 7f036bb5c700 0 [progress WARNING root]
complete: ev 92aa85cb-2a14-42f9-a0ff-e7dec5f59a29 does not exist
2022-10-17T17:31:39.652+0000 7f036bb5c700 0 [progress WARNING root]
complete: ev eb267c4d-0886-4499-b89f-a9c417126d84 does not exist
2022-10-17T17:31:39.654+0000 7f036bb5c700 0 [progress WARNING root]
complete: ev 93123170-3b34-4001-b7e3-0afee8697769 does not exist
2022-10-17T17:31:39.655+0000 7f036bb5c700 0 [progress WARNING root]
complete: ev 8b237a77-f57a-4d5f-85c3-4313147c6e73 does not exist
2022-10-17T17:31:41.586+0000 7f036d35f700 0 log_channel(cluster) log [DBG]
: pgmap v1006972: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142
TiB / 178 TiB avail
2022-10-17T17:31:42.240+0000 7f03581b5700 0 [rbd_support INFO root]
TrashPurgeScheduleHandler: load_schedules
2022-10-17T17:31:42.460+0000 7f03a73ce700 1 mgr finish mon failed to return
metadata for osd.37: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700 1 mgr finish mon failed to return
metadata for osd.50: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700 1 mgr finish mon failed to return
metadata for osd.51: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700 1 mgr finish mon failed to return
metadata for osd.52: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700 1 mgr finish mon failed to return
metadata for osd.54: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a6bcd700 1 mgr finish mon failed to return
metadata for osd.53: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700 1 mgr finish mon failed to return
metadata for osd.55: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a6bcd700 1 mgr finish mon failed to return
metadata for osd.56: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700 1 mgr finish mon failed to return
metadata for osd.57: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a6bcd700 1 mgr finish mon failed to return
metadata for osd.58: (2) No such file or directory
2022-10-17T17:31:43.587+0000 7f036d35f700 0 log_channel(cluster) log [DBG]
: pgmap v1006974: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142
TiB / 178 TiB avail
2022-10-17T17:31:45.043+0000 7f035fb44700 -1 mgr get_metadata_python
Requested missing service osd.37
2022-10-17T17:31:45.589+0000 7f036d35f700 0 log_channel(cluster) log [DBG]
: pgmap v1006975: 656 pgs: 656 active+clean; 12 TiB data,
-Brent
-----Original Message-----
From: Eugen Block <eblock@xxxxxx <mailto:eblock@xxxxxx> >
Sent: Monday, October 17, 2022 12:52 PM
To: ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
Subject: Re: Cephadm - Adding host to migrated cluster
Does the cephadm.log on that node reveal anything useful? What about the
(active) mgr log?
Zitat von Brent Kennedy <bkennedy@xxxxxxxxxx <mailto:bkennedy@xxxxxxxxxx> >:
Greetings everyone,
We recently moved a ceph-ansible cluster running pacific on centos 8
to centos 8 stream and then upgraded to quincy using cephadm after
converting to cephadm. Everything with the transition worked but
recently we decided to add another node to the cluster with 10 more
drives. We were able to go to the web interface and add the host (
with the IP and name ), which spun up the basic management containers
on the new node. We then went to the OSD section to add the drives
which were showing as available. They were all recognized, so the
drives were added via the web console. Cephadm spun up the OSDs and
that's where things are stuck. The OSDs show up in the cluster but
are out now. They came up but were then marked down and later out.
We purged them then zapped the drives and after about 10 minutes,
cephadm had added them back automatically. It then did the same
thing, showed them up, then down and put them out. When I look at
"ceph osd tree", it shows the drives but they don't show up under any
host ( they are on host osdserver6 ). I am trying to figure out why
they are not being put under a host since the host server was added to
cephadm and the server install checks with cephadm were good. The
maintenance containers are running on the host, no issues. Any ideas would
be greatly appreciated.
-16 36.38199 host osdserver5
20 ssd 3.63820 osd.20 up 1.00000 1.00000
22 ssd 3.63820 osd.22 up 1.00000 1.00000
23 ssd 3.63820 osd.23 up 1.00000 1.00000
24 ssd 3.63820 osd.24 up 1.00000 1.00000
44 ssd 3.63820 osd.44 up 1.00000 1.00000
45 ssd 3.63820 osd.45 up 1.00000 1.00000
46 ssd 3.63820 osd.46 up 1.00000 1.00000
47 ssd 3.63820 osd.47 up 1.00000 1.00000
48 ssd 3.63820 osd.48 up 1.00000 1.00000
49 ssd 3.63820 osd.49 up 1.00000 1.00000
37 0 osd.37 down 1.00000 1.00000
50 0 osd.50 down 1.00000 1.00000
51 0 osd.51 down 1.00000 1.00000
52 0 osd.52 down 1.00000 1.00000
53 0 osd.53 down 1.00000 1.00000
54 0 osd.54 down 1.00000 1.00000
55 0 osd.55 down 1.00000 1.00000
56 0 osd.56 down 1.00000 1.00000
57 0 osd.57 down 1.00000 1.00000
58 0 osd.58 down 1.00000 1.00000
Regards,
-Brent
Existing Clusters:
Test: Quincy 17.2.3 ( all virtual on nvme )
US Production(HDD): Octopus 15.2.16 with 11 osd servers, 3 mons, 4
gateways,
2 iscsi gateways
UK Production(HDD): Nautilus 14.2.22 with 18 osd servers, 3 mons, 4
gateways, 2 iscsi gateways
US Production(SSD): Quincy 17.2.3 Cephadm with 6 osd servers, 5 mons,
4 gateways, 2 iscsi gateways
UK Production(SSD): Quincy 17.2.3 with 6 osd servers, 5 mons, 4
gateways
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx> To unsubscribe send an
email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx> To unsubscribe send an email
to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx