Re: Cephadm - Adding host to migrated cluster

"Brent Kennedy" <bkennedy@xxxxxxxxxx> · Mon, 17 Oct 2022 13:36:22 -0400

Below is what the ceph mgr log is saying as soon as I zap the disks and it
tries to add them.  Note, the crash and node exporter containers were
started from the cluster when the node was added to the cluster( no issues
or manual involvement ).  

0376a72700  0 log_channel(cephadm) log [INF] : Detected new or changed
devices on server6
2022-10-17T17:31:39.585+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
: pgmap v1006971: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142
TiB / 178 TiB avail
2022-10-17T17:31:39.647+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 36e9a875-1bf6-4d2f-9440-0b875cc408a6 does not exist
2022-10-17T17:31:39.648+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev c9b942c6-0a54-4a1d-9019-dfdaf4e6e36c does not exist
2022-10-17T17:31:39.648+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 18b2e448-e522-49c0-a817-5edafbcc3eb2 does not exist
2022-10-17T17:31:39.650+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 9e808b8e-174f-45ce-a898-75ad3c785fa8 does not exist
2022-10-17T17:31:39.651+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 92aa85cb-2a14-42f9-a0ff-e7dec5f59a29 does not exist
2022-10-17T17:31:39.652+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev eb267c4d-0886-4499-b89f-a9c417126d84 does not exist
2022-10-17T17:31:39.654+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 93123170-3b34-4001-b7e3-0afee8697769 does not exist
2022-10-17T17:31:39.655+0000 7f036bb5c700  0 [progress WARNING root]
complete: ev 8b237a77-f57a-4d5f-85c3-4313147c6e73 does not exist
2022-10-17T17:31:41.586+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
: pgmap v1006972: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142
TiB / 178 TiB avail
2022-10-17T17:31:42.240+0000 7f03581b5700  0 [rbd_support INFO root]
TrashPurgeScheduleHandler: load_schedules
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.37: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.50: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.51: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.52: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.54: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a6bcd700  1 mgr finish mon failed to return
metadata for osd.53: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.55: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a6bcd700  1 mgr finish mon failed to return
metadata for osd.56: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to return
metadata for osd.57: (2) No such file or directory
2022-10-17T17:31:42.460+0000 7f03a6bcd700  1 mgr finish mon failed to return
metadata for osd.58: (2) No such file or directory
2022-10-17T17:31:43.587+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
: pgmap v1006974: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142
TiB / 178 TiB avail
2022-10-17T17:31:45.043+0000 7f035fb44700 -1 mgr get_metadata_python
Requested missing service osd.37
2022-10-17T17:31:45.589+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
: pgmap v1006975: 656 pgs: 656 active+clean; 12 TiB data,

-Brent

-----Original Message-----
From: Eugen Block <eblock@xxxxxx> 
Sent: Monday, October 17, 2022 12:52 PM
To: ceph-users@xxxxxxx
Subject:  Re: Cephadm - Adding host to migrated cluster

Does the cephadm.log on that node reveal anything useful? What about the
(active) mgr log?

Zitat von Brent Kennedy <bkennedy@xxxxxxxxxx>:

> Greetings everyone,
>
>
>
> We recently moved a ceph-ansible cluster running pacific on centos 8 
> to centos 8 stream and then upgraded to quincy using cephadm after 
> converting to cephadm.  Everything with the transition worked but 
> recently we decided to add another node to the cluster with 10 more 
> drives.  We were able to go to the web interface and add the host ( 
> with the IP and name ), which spun up the basic management containers 
> on the new node.  We then went to the OSD section to add the drives 
> which were showing as available.  They were all recognized, so the 
> drives were added via the web console.  Cephadm spun up the OSDs and 
> that's where things are stuck.  The OSDs show up in the cluster but 
> are out now.  They came up but were then marked down and later out.  
> We purged them then zapped the drives and after about 10 minutes, 
> cephadm had added them back automatically.  It then did the same 
> thing, showed them up, then down and put them out.  When I look at 
> "ceph osd tree", it shows the drives but they don't show up under any 
> host ( they are on host osdserver6 ).  I am trying to figure out why 
> they are not being put under a host since the host server was added to 
> cephadm and the server install checks with cephadm were good.  The
maintenance containers are running on the host, no issues.  Any ideas would
be greatly appreciated.
>
>
>
> -16          36.38199      host osdserver5
>
> 20    ssd    3.63820          osd.20              up   1.00000  1.00000
>
> 22    ssd    3.63820          osd.22              up   1.00000  1.00000
>
> 23    ssd    3.63820          osd.23              up   1.00000  1.00000
>
> 24    ssd    3.63820          osd.24              up   1.00000  1.00000
>
> 44    ssd    3.63820          osd.44              up   1.00000  1.00000
>
> 45    ssd    3.63820          osd.45              up   1.00000  1.00000
>
> 46    ssd    3.63820          osd.46              up   1.00000  1.00000
>
> 47    ssd    3.63820          osd.47              up   1.00000  1.00000
>
> 48    ssd    3.63820          osd.48              up   1.00000  1.00000
>
> 49    ssd    3.63820          osd.49              up   1.00000  1.00000
>
> 37                 0  osd.37                    down   1.00000  1.00000
>
> 50                 0  osd.50                    down   1.00000  1.00000
>
> 51                 0  osd.51                    down   1.00000  1.00000
>
> 52                 0  osd.52                    down   1.00000  1.00000
>
> 53                 0  osd.53                    down   1.00000  1.00000
>
> 54                 0  osd.54                    down   1.00000  1.00000
>
> 55                 0  osd.55                    down   1.00000  1.00000
>
> 56                 0  osd.56                    down   1.00000  1.00000
>
> 57                 0  osd.57                    down   1.00000  1.00000
>
> 58                 0  osd.58                    down   1.00000  1.00000
>
>
>
>
>
> Regards,
>
> -Brent
>
>
>
> Existing Clusters:
>
> Test: Quincy 17.2.3 ( all virtual on nvme )
>
> US Production(HDD): Octopus 15.2.16 with 11 osd servers, 3 mons, 4 
> gateways,
> 2 iscsi gateways
>
> UK Production(HDD): Nautilus 14.2.22 with 18 osd servers, 3 mons, 4 
> gateways, 2 iscsi gateways
>
> US Production(SSD): Quincy 17.2.3 Cephadm with 6 osd servers, 5 mons, 
> 4 gateways, 2 iscsi gateways
>
> UK Production(SSD): Quincy 17.2.3 with 6 osd servers, 5 mons, 4 
> gateways
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
> email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email
to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx