Re: Cephadm - Adding host to migrated cluster

"Brent Kennedy" <bkennedy@xxxxxxxxxx> · Tue, 18 Oct 2022 02:41:27 -0400

I didn’t have a chance to run this while things were down.  I went ahead and purged the OSDs again, then went to the OSD server and deleted all the osd instance folders in /var/lib/ceph.  Waited for the cluster to clear out the daemons and then zapped the drives.  The one thing I did differently this time, was place a copy of the ceph.conf in the /etc/ceph folder on the new server.  I had not put the conf file there because I thought cephadm doesn’t require local configurations.  But with the configuration file there this time, the drives were automatically added after the zap and all of them came up without issue.  I never thought to put the file there because the first cluster I spun up with cephadm ( not a conversion though, it was a fresh install ), doesn’t have ceph.conf files or a /etc/ceph directory on any of the servers.  I was doing some googlefu searching and found someone mention an OSD wouldn’t come up without a conf file with cephadm, was showing a strange error, the fix was to put the file in place.  

I gather this is a legacy setting that I need to clean up and update in the cephadm configuration?  Is this documented as something to do as part of the conversion and I missed it?  I don’t see the point in relying on local conf files if everything is being kept in the cluster.

Thanks for looking at this guys!  I will keep that command handy, still learning the ins/outs of cephadm.

-Brent

From: Adam King <adking@xxxxxxxxxx> 
Sent: Monday, October 17, 2022 2:25 PM
To: Brent Kennedy <bkennedy@xxxxxxxxxx>
Cc: Eugen Block <eblock@xxxxxx>; ceph-users@xxxxxxx
Subject: Re:  Re: Cephadm - Adding host to migrated cluster

Do the journal logs for the OSDs say anything about why they couldn't start up? ("cephadm ls --no-detail" run on the host will give the systemd units for each daemon on the host so you can get them easier).

On Mon, Oct 17, 2022 at 1:37 PM Brent Kennedy <bkennedy@xxxxxxxxxx <mailto:bkennedy@xxxxxxxxxx> > wrote:

Below is what the ceph mgr log is saying as soon as I zap the disks and it
tries to add them.  Note, the crash and node exporter containers were
started from the cluster when the node was added to the cluster( no issues
or manual involvement ).  

0376a72700  0 log_channel(cephadm) log [INF] : Detected new or changed
devices on server6
2022-10-17T17:31:39.585+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
: pgmap v1006971: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142
TiB / 178 TiB avail
2022-10-17T17:31:39.647+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 36e9a875-1bf6-4d2f-9440-0b875cc408a6 does not exist
2022-10-17T17:31:39.648+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev c9b942c6-0a54-4a1d-9019-dfdaf4e6e36c does not exist
2022-10-17T17:31:39.648+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 18b2e448-e522-49c0-a817-5edafbcc3eb2 does not exist
2022-10-17T17:31:39.650+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 9e808b8e-174f-45ce-a898-75ad3c785fa8 does not exist
2022-10-17T17:31:39.651+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 92aa85cb-2a14-42f9-a0ff-e7dec5f59a29 does not exist
2022-10-17T17:31:39.652+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev eb267c4d-0886-4499-b89f-a9c417126d84 does not exist
2022-10-17T17:31:39.654+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 93123170-3b34-4001-b7e3-0afee8697769 does not exist
2022-10-17T17:31:39.655+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 8b237a77-f57a-4d5f-85c3-4313147c6e73 does not exist
2022-10-17T17:31:41.586+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
: pgmap v1006972: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142
TiB / 178 TiB avail
2022-10-17T17:31:42.240+0000 7f03581b5700  0 [rbd_support INFO root]
TrashPurgeScheduleHandler: load_schedules
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.37: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.50: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.51: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.52: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.54: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a6bcd700  1 mgr finish mon failed to return
metadata for osd.53: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.55: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a6bcd700  1 mgr finish mon failed to return
metadata for osd.56: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.57: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a6bcd700  1 mgr finish mon failed to return
metadata for osd.58: (2) No such file or directory
2022-10-17T17:31:43.587+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
: pgmap v1006974: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142
TiB / 178 TiB avail
2022-10-17T17:31:45.043+0000 7f035fb44700 -1 mgr get_metadata_python
Requested missing service osd.37
2022-10-17T17:31:45.589+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
: pgmap v1006975: 656 pgs: 656 active+clean; 12 TiB data,

-Brent

-----Original Message-----
From: Eugen Block <eblock@xxxxxx <mailto:eblock@xxxxxx> > 
Sent: Monday, October 17, 2022 12:52 PM
To: ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx> 
Subject:  Re: Cephadm - Adding host to migrated cluster

Does the cephadm.log on that node reveal anything useful? What about the
(active) mgr log?

Zitat von Brent Kennedy <bkennedy@xxxxxxxxxx <mailto:bkennedy@xxxxxxxxxx> >:

> Greetings everyone,
>
>
>
> We recently moved a ceph-ansible cluster running pacific on centos 8 
> to centos 8 stream and then upgraded to quincy using cephadm after 
> converting to cephadm.  Everything with the transition worked but 
> recently we decided to add another node to the cluster with 10 more 
> drives.  We were able to go to the web interface and add the host ( 
> with the IP and name ), which spun up the basic management containers 
> on the new node.  We then went to the OSD section to add the drives 
> which were showing as available.  They were all recognized, so the 
> drives were added via the web console.  Cephadm spun up the OSDs and 
> that's where things are stuck.  The OSDs show up in the cluster but 
> are out now.  They came up but were then marked down and later out.  
> We purged them then zapped the drives and after about 10 minutes, 
> cephadm had added them back automatically.  It then did the same 
> thing, showed them up, then down and put them out.  When I look at 
> "ceph osd tree", it shows the drives but they don't show up under any 
> host ( they are on host osdserver6 ).  I am trying to figure out why 
> they are not being put under a host since the host server was added to 
> cephadm and the server install checks with cephadm were good.  The
maintenance containers are running on the host, no issues.  Any ideas would
be greatly appreciated.
>
>
>
> -16          36.38199      host osdserver5
>
> 20    ssd    3.63820          osd.20              up   1.00000  1.00000
>
> 22    ssd    3.63820          osd.22              up   1.00000  1.00000
>
> 23    ssd    3.63820          osd.23              up   1.00000  1.00000
>
> 24    ssd    3.63820          osd.24              up   1.00000  1.00000
>
> 44    ssd    3.63820          osd.44              up   1.00000  1.00000
>
> 45    ssd    3.63820          osd.45              up   1.00000  1.00000
>
> 46    ssd    3.63820          osd.46              up   1.00000  1.00000
>
> 47    ssd    3.63820          osd.47              up   1.00000  1.00000
>
> 48    ssd    3.63820          osd.48              up   1.00000  1.00000
>
> 49    ssd    3.63820          osd.49              up   1.00000  1.00000
>
> 37                 0  osd.37                    down   1.00000  1.00000
>
> 50                 0  osd.50                    down   1.00000  1.00000
>
> 51                 0  osd.51                    down   1.00000  1.00000
>
> 52                 0  osd.52                    down   1.00000  1.00000
>
> 53                 0  osd.53                    down   1.00000  1.00000
>
> 54                 0  osd.54                    down   1.00000  1.00000
>
> 55                 0  osd.55                    down   1.00000  1.00000
>
> 56                 0  osd.56                    down   1.00000  1.00000
>
> 57                 0  osd.57                    down   1.00000  1.00000
>
> 58                 0  osd.58                    down   1.00000  1.00000
>
>
>
>
>
> Regards,
>
> -Brent
>
>
>
> Existing Clusters:
>
> Test: Quincy 17.2.3 ( all virtual on nvme )
>
> US Production(HDD): Octopus 15.2.16 with 11 osd servers, 3 mons, 4 
> gateways,
> 2 iscsi gateways
>
> UK Production(HDD): Nautilus 14.2.22 with 18 osd servers, 3 mons, 4 
> gateways, 2 iscsi gateways
>
> US Production(SSD): Quincy 17.2.3 Cephadm with 6 osd servers, 5 mons, 
> 4 gateways, 2 iscsi gateways
>
> UK Production(SSD): Quincy 17.2.3 with 6 osd servers, 5 mons, 4 
> gateways
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>  To unsubscribe send an 
> email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>  To unsubscribe send an email
to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx> 
To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx