Re: Cephadm - Adding host to migrated cluster

Adam King <adking@xxxxxxxxxx> · Mon, 17 Oct 2022 14:25:19 -0400

Do the journal logs for the OSDs say anything about why they couldn't start
up? ("cephadm ls --no-detail" run on the host will give the systemd units
for each daemon on the host so you can get them easier).

On Mon, Oct 17, 2022 at 1:37 PM Brent Kennedy <bkennedy@xxxxxxxxxx> wrote:

> Below is what the ceph mgr log is saying as soon as I zap the disks and it
> tries to add them.  Note, the crash and node exporter containers were
> started from the cluster when the node was added to the cluster( no issues
> or manual involvement ).
>
> 0376a72700  0 log_channel(cephadm) log [INF] : Detected new or changed
> devices on server6
> 2022-10-17T17:31:39.585+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
> : pgmap v1006971: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142
> TiB / 178 TiB avail
> 2022-10-17T17:31:39.647+0000 7f036bb5c700  0 [progress WARNING root]
> complete: ev 36e9a875-1bf6-4d2f-9440-0b875cc408a6 does not exist
> 2022-10-17T17:31:39.648+0000 7f036bb5c700  0 [progress WARNING root]
> complete: ev c9b942c6-0a54-4a1d-9019-dfdaf4e6e36c does not exist
> 2022-10-17T17:31:39.648+0000 7f036bb5c700  0 [progress WARNING root]
> complete: ev 18b2e448-e522-49c0-a817-5edafbcc3eb2 does not exist
> 2022-10-17T17:31:39.650+0000 7f036bb5c700  0 [progress WARNING root]
> complete: ev 9e808b8e-174f-45ce-a898-75ad3c785fa8 does not exist
> 2022-10-17T17:31:39.651+0000 7f036bb5c700  0 [progress WARNING root]
> complete: ev 92aa85cb-2a14-42f9-a0ff-e7dec5f59a29 does not exist
> 2022-10-17T17:31:39.652+0000 7f036bb5c700  0 [progress WARNING root]
> complete: ev eb267c4d-0886-4499-b89f-a9c417126d84 does not exist
> 2022-10-17T17:31:39.654+0000 7f036bb5c700  0 [progress WARNING root]
> complete: ev 93123170-3b34-4001-b7e3-0afee8697769 does not exist
> 2022-10-17T17:31:39.655+0000 7f036bb5c700  0 [progress WARNING root]
> complete: ev 8b237a77-f57a-4d5f-85c3-4313147c6e73 does not exist
> 2022-10-17T17:31:41.586+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
> : pgmap v1006972: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142
> TiB / 178 TiB avail
> 2022-10-17T17:31:42.240+0000 7f03581b5700  0 [rbd_support INFO root]
> TrashPurgeScheduleHandler: load_schedules
> 2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to
> return
> metadata for osd.37: (2) No such file or directory
> 2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to
> return
> metadata for osd.50: (2) No such file or directory
> 2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to
> return
> metadata for osd.51: (2) No such file or directory
> 2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to
> return
> metadata for osd.52: (2) No such file or directory
> 2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to
> return
> metadata for osd.54: (2) No such file or directory
> 2022-10-17T17:31:42.460+0000 7f03a6bcd700  1 mgr finish mon failed to
> return
> metadata for osd.53: (2) No such file or directory
> 2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to
> return
> metadata for osd.55: (2) No such file or directory
> 2022-10-17T17:31:42.460+0000 7f03a6bcd700  1 mgr finish mon failed to
> return
> metadata for osd.56: (2) No such file or directory
> 2022-10-17T17:31:42.460+0000 7f03a73ce700  1 mgr finish mon failed to
> return
> metadata for osd.57: (2) No such file or directory
> 2022-10-17T17:31:42.460+0000 7f03a6bcd700  1 mgr finish mon failed to
> return
> metadata for osd.58: (2) No such file or directory
> 2022-10-17T17:31:43.587+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
> : pgmap v1006974: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142
> TiB / 178 TiB avail
> 2022-10-17T17:31:45.043+0000 7f035fb44700 -1 mgr get_metadata_python
> Requested missing service osd.37
> 2022-10-17T17:31:45.589+0000 7f036d35f700  0 log_channel(cluster) log [DBG]
> : pgmap v1006975: 656 pgs: 656 active+clean; 12 TiB data,
>
> -Brent
>
> -----Original Message-----
> From: Eugen Block <eblock@xxxxxx>
> Sent: Monday, October 17, 2022 12:52 PM
> To: ceph-users@xxxxxxx
> Subject:  Re: Cephadm - Adding host to migrated cluster
>
> Does the cephadm.log on that node reveal anything useful? What about the
> (active) mgr log?
>
> Zitat von Brent Kennedy <bkennedy@xxxxxxxxxx>:
>
> > Greetings everyone,
> >
> >
> >
> > We recently moved a ceph-ansible cluster running pacific on centos 8
> > to centos 8 stream and then upgraded to quincy using cephadm after
> > converting to cephadm.  Everything with the transition worked but
> > recently we decided to add another node to the cluster with 10 more
> > drives.  We were able to go to the web interface and add the host (
> > with the IP and name ), which spun up the basic management containers
> > on the new node.  We then went to the OSD section to add the drives
> > which were showing as available.  They were all recognized, so the
> > drives were added via the web console.  Cephadm spun up the OSDs and
> > that's where things are stuck.  The OSDs show up in the cluster but
> > are out now.  They came up but were then marked down and later out.
> > We purged them then zapped the drives and after about 10 minutes,
> > cephadm had added them back automatically.  It then did the same
> > thing, showed them up, then down and put them out.  When I look at
> > "ceph osd tree", it shows the drives but they don't show up under any
> > host ( they are on host osdserver6 ).  I am trying to figure out why
> > they are not being put under a host since the host server was added to
> > cephadm and the server install checks with cephadm were good.  The
> maintenance containers are running on the host, no issues.  Any ideas would
> be greatly appreciated.
> >
> >
> >
> > -16          36.38199      host osdserver5
> >
> > 20    ssd    3.63820          osd.20              up   1.00000  1.00000
> >
> > 22    ssd    3.63820          osd.22              up   1.00000  1.00000
> >
> > 23    ssd    3.63820          osd.23              up   1.00000  1.00000
> >
> > 24    ssd    3.63820          osd.24              up   1.00000  1.00000
> >
> > 44    ssd    3.63820          osd.44              up   1.00000  1.00000
> >
> > 45    ssd    3.63820          osd.45              up   1.00000  1.00000
> >
> > 46    ssd    3.63820          osd.46              up   1.00000  1.00000
> >
> > 47    ssd    3.63820          osd.47              up   1.00000  1.00000
> >
> > 48    ssd    3.63820          osd.48              up   1.00000  1.00000
> >
> > 49    ssd    3.63820          osd.49              up   1.00000  1.00000
> >
> > 37                 0  osd.37                    down   1.00000  1.00000
> >
> > 50                 0  osd.50                    down   1.00000  1.00000
> >
> > 51                 0  osd.51                    down   1.00000  1.00000
> >
> > 52                 0  osd.52                    down   1.00000  1.00000
> >
> > 53                 0  osd.53                    down   1.00000  1.00000
> >
> > 54                 0  osd.54                    down   1.00000  1.00000
> >
> > 55                 0  osd.55                    down   1.00000  1.00000
> >
> > 56                 0  osd.56                    down   1.00000  1.00000
> >
> > 57                 0  osd.57                    down   1.00000  1.00000
> >
> > 58                 0  osd.58                    down   1.00000  1.00000
> >
> >
> >
> >
> >
> > Regards,
> >
> > -Brent
> >
> >
> >
> > Existing Clusters:
> >
> > Test: Quincy 17.2.3 ( all virtual on nvme )
> >
> > US Production(HDD): Octopus 15.2.16 with 11 osd servers, 3 mons, 4
> > gateways,
> > 2 iscsi gateways
> >
> > UK Production(HDD): Nautilus 14.2.22 with 18 osd servers, 3 mons, 4
> > gateways, 2 iscsi gateways
> >
> > US Production(SSD): Quincy 17.2.3 Cephadm with 6 osd servers, 5 mons,
> > 4 gateways, 2 iscsi gateways
> >
> > UK Production(SSD): Quincy 17.2.3 with 6 osd servers, 5 mons, 4
> > gateways
> >
> >
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an
> > email to ceph-users-leave@xxxxxxx
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email
> to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx