Re: Cephadm - Adding host to migrated cluster

Eugen Block <eblock@xxxxxx> · Tue, 18 Oct 2022 10:02:07 +0000

That doesn't sound right. I had a single node cluster deployed with  
16.2.5 and tried to reproduce. I only installed cephadm and copied the  
cephadm public key to the new node and added it to the cluster via  
dashboard. Then I added some disks to it and they were successfully  
deployed as OSDs. The second node doesn't have a local ceph.conf in  
/etc/ceph:

pacific2:~ # ll /etc/ceph/
insgesamt 4
-rw-r--r-- 1 root root 92 23. Dez 2021  rbdmap

Could this be somehow related to ceph-ansible (which I don't use)?

Zitat von Brent Kennedy <bkennedy@xxxxxxxxxx>:

I didn’t have a chance to run this while things were down.  I went  
ahead and purged the OSDs again, then went to the OSD server and  
deleted all the osd instance folders in /var/lib/ceph.  Waited for  
the cluster to clear out the daemons and then zapped the drives.   
The one thing I did differently this time, was place a copy of the  
ceph.conf in the /etc/ceph folder on the new server.  I had not put  
the conf file there because I thought cephadm doesn’t require local  
configurations.  But with the configuration file there this time,  
the drives were automatically added after the zap and all of them  
came up without issue.  I never thought to put the file there  
because the first cluster I spun up with cephadm ( not a conversion  
though, it was a fresh install ), doesn’t have ceph.conf files or a  
/etc/ceph directory on any of the servers.  I was doing some  
googlefu searching and found someone mention an OSD wouldn’t come up  
without a conf file with cephadm, was showing a strange error, the  
fix was to put the file in place.

I gather this is a legacy setting that I need to clean up and update  
in the cephadm configuration?  Is this documented as something to do  
as part of the conversion and I missed it?  I don’t see the point in  
relying on local conf files if everything is being kept in the  
cluster.

Thanks for looking at this guys!  I will keep that command handy,  
still learning the ins/outs of cephadm.

-Brent

From: Adam King <adking@xxxxxxxxxx>
Sent: Monday, October 17, 2022 2:25 PM
To: Brent Kennedy <bkennedy@xxxxxxxxxx>
Cc: Eugen Block <eblock@xxxxxx>; ceph-users@xxxxxxx
Subject: Re:  Re: Cephadm - Adding host to migrated cluster

Do the journal logs for the OSDs say anything about why they  
couldn't start up? ("cephadm ls --no-detail" run on the host will  
give the systemd units for each daemon on the host so you can get  
them easier).

On Mon, Oct 17, 2022 at 1:37 PM Brent Kennedy <bkennedy@xxxxxxxxxx  
<mailto:bkennedy@xxxxxxxxxx> > wrote:

Below is what the ceph mgr log is saying as soon as I zap the disks and it
tries to add them.  Note, the crash and node exporter containers were
started from the cluster when the node was added to the cluster( no issues
or manual involvement ).

0376a72700  0 log_channel(cephadm) log [INF] : Detected new or changed
devices on server6
2022-10-17T17:31:39.585+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
: pgmap v1006971: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142
TiB / 178 TiB avail
2022-10-17T17:31:39.647+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 36e9a875-1bf6-4d2f-9440-0b875cc408a6 does not exist
2022-10-17T17:31:39.648+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev c9b942c6-0a54-4a1d-9019-dfdaf4e6e36c does not exist
2022-10-17T17:31:39.648+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 18b2e448-e522-49c0-a817-5edafbcc3eb2 does not exist
2022-10-17T17:31:39.650+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 9e808b8e-174f-45ce-a898-75ad3c785fa8 does not exist
2022-10-17T17:31:39.651+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 92aa85cb-2a14-42f9-a0ff-e7dec5f59a29 does not exist
2022-10-17T17:31:39.652+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev eb267c4d-0886-4499-b89f-a9c417126d84 does not exist
2022-10-17T17:31:39.654+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 93123170-3b34-4001-b7e3-0afee8697769 does not exist
2022-10-17T17:31:39.655+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 8b237a77-f57a-4d5f-85c3-4313147c6e73 does not exist
2022-10-17T17:31:41.586+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
: pgmap v1006972: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142
TiB / 178 TiB avail
2022-10-17T17:31:42.240+0000 7f03581b5700  0 [rbd_support INFO root]
TrashPurgeScheduleHandler: load_schedules
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.37: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.50: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.51: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.52: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.54: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a6bcd700  1 mgr finish mon failed to return
metadata for osd.53: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.55: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a6bcd700  1 mgr finish mon failed to return
metadata for osd.56: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.57: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a6bcd700  1 mgr finish mon failed to return
metadata for osd.58: (2) No such file or directory
2022-10-17T17:31:43.587+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
: pgmap v1006974: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142
TiB / 178 TiB avail
2022-10-17T17:31:45.043+0000 7f035fb44700 -1 mgr get_metadata_python
Requested missing service osd.37
2022-10-17T17:31:45.589+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
: pgmap v1006975: 656 pgs: 656 active+clean; 12 TiB data,

-Brent

-----Original Message-----
From: Eugen Block <eblock@xxxxxx <mailto:eblock@xxxxxx> >
Sent: Monday, October 17, 2022 12:52 PM
To: ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
Subject:  Re: Cephadm - Adding host to migrated cluster

Does the cephadm.log on that node reveal anything useful? What about the
(active) mgr log?

Zitat von Brent Kennedy <bkennedy@xxxxxxxxxx <mailto:bkennedy@xxxxxxxxxx> >:

Greetings everyone,

We recently moved a ceph-ansible cluster running pacific on centos 8
to centos 8 stream and then upgraded to quincy using cephadm after
converting to cephadm.  Everything with the transition worked but
recently we decided to add another node to the cluster with 10 more
drives.  We were able to go to the web interface and add the host (
with the IP and name ), which spun up the basic management containers
on the new node.  We then went to the OSD section to add the drives
which were showing as available.  They were all recognized, so the
drives were added via the web console.  Cephadm spun up the OSDs and
that's where things are stuck.  The OSDs show up in the cluster but
are out now.  They came up but were then marked down and later out.
We purged them then zapped the drives and after about 10 minutes,
cephadm had added them back automatically.  It then did the same
thing, showed them up, then down and put them out.  When I look at
"ceph osd tree", it shows the drives but they don't show up under any
host ( they are on host osdserver6 ).  I am trying to figure out why
they are not being put under a host since the host server was added to
cephadm and the server install checks with cephadm were good.  The
maintenance containers are running on the host, no issues.  Any ideas would
be greatly appreciated.

-16          36.38199      host osdserver5

20    ssd    3.63820          osd.20              up   1.00000  1.00000

22    ssd    3.63820          osd.22              up   1.00000  1.00000

23    ssd    3.63820          osd.23              up   1.00000  1.00000

24    ssd    3.63820          osd.24              up   1.00000  1.00000

44    ssd    3.63820          osd.44              up   1.00000  1.00000

45    ssd    3.63820          osd.45              up   1.00000  1.00000

46    ssd    3.63820          osd.46              up   1.00000  1.00000

47    ssd    3.63820          osd.47              up   1.00000  1.00000

48    ssd    3.63820          osd.48              up   1.00000  1.00000

49    ssd    3.63820          osd.49              up   1.00000  1.00000

37                 0  osd.37                    down   1.00000  1.00000

50                 0  osd.50                    down   1.00000  1.00000

51                 0  osd.51                    down   1.00000  1.00000

52                 0  osd.52                    down   1.00000  1.00000

53                 0  osd.53                    down   1.00000  1.00000

54                 0  osd.54                    down   1.00000  1.00000

55                 0  osd.55                    down   1.00000  1.00000

56                 0  osd.56                    down   1.00000  1.00000

57                 0  osd.57                    down   1.00000  1.00000

58                 0  osd.58                    down   1.00000  1.00000

Regards,

-Brent

Existing Clusters:

Test: Quincy 17.2.3 ( all virtual on nvme )

US Production(HDD): Octopus 15.2.16 with 11 osd servers, 3 mons, 4
gateways,
2 iscsi gateways

UK Production(HDD): Nautilus 14.2.22 with 18 osd servers, 3 mons, 4
gateways, 2 iscsi gateways

US Production(SSD): Quincy 17.2.3 Cephadm with 6 osd servers, 5 mons,
4 gateways, 2 iscsi gateways

UK Production(SSD): Quincy 17.2.3 with 6 osd servers, 5 mons, 4
gateways

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx  
<mailto:ceph-users@xxxxxxx>  To unsubscribe send an
email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx  
<mailto:ceph-users@xxxxxxx>  To unsubscribe send an email
to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx  
<mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx