Do the journal logs for the OSDs say anything about why they couldn't start up? ("cephadm ls --no-detail" run on the host will give the systemd units for each daemon on the host so you can get them easier). On Mon, Oct 17, 2022 at 1:37 PM Brent Kennedy <bkennedy@xxxxxxxxxx> wrote: > Below is what the ceph mgr log is saying as soon as I zap the disks and it > tries to add them. Note, the crash and node exporter containers were > started from the cluster when the node was added to the cluster( no issues > or manual involvement ). > > 0376a72700 0 log_channel(cephadm) log [INF] : Detected new or changed > devices on server6 > 2022-10-17T17:31:39.585+0000 7f036d35f700 0 log_channel(cluster) log [DBG] > : pgmap v1006971: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142 > TiB / 178 TiB avail > 2022-10-17T17:31:39.647+0000 7f036bb5c700 0 [progress WARNING root] > complete: ev 36e9a875-1bf6-4d2f-9440-0b875cc408a6 does not exist > 2022-10-17T17:31:39.648+0000 7f036bb5c700 0 [progress WARNING root] > complete: ev c9b942c6-0a54-4a1d-9019-dfdaf4e6e36c does not exist > 2022-10-17T17:31:39.648+0000 7f036bb5c700 0 [progress WARNING root] > complete: ev 18b2e448-e522-49c0-a817-5edafbcc3eb2 does not exist > 2022-10-17T17:31:39.650+0000 7f036bb5c700 0 [progress WARNING root] > complete: ev 9e808b8e-174f-45ce-a898-75ad3c785fa8 does not exist > 2022-10-17T17:31:39.651+0000 7f036bb5c700 0 [progress WARNING root] > complete: ev 92aa85cb-2a14-42f9-a0ff-e7dec5f59a29 does not exist > 2022-10-17T17:31:39.652+0000 7f036bb5c700 0 [progress WARNING root] > complete: ev eb267c4d-0886-4499-b89f-a9c417126d84 does not exist > 2022-10-17T17:31:39.654+0000 7f036bb5c700 0 [progress WARNING root] > complete: ev 93123170-3b34-4001-b7e3-0afee8697769 does not exist > 2022-10-17T17:31:39.655+0000 7f036bb5c700 0 [progress WARNING root] > complete: ev 8b237a77-f57a-4d5f-85c3-4313147c6e73 does not exist > 2022-10-17T17:31:41.586+0000 7f036d35f700 0 log_channel(cluster) log [DBG] > : pgmap v1006972: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142 > TiB / 178 TiB avail > 2022-10-17T17:31:42.240+0000 7f03581b5700 0 [rbd_support INFO root] > TrashPurgeScheduleHandler: load_schedules > 2022-10-17T17:31:42.460+0000 7f03a73ce700 1 mgr finish mon failed to > return > metadata for osd.37: (2) No such file or directory > 2022-10-17T17:31:42.460+0000 7f03a73ce700 1 mgr finish mon failed to > return > metadata for osd.50: (2) No such file or directory > 2022-10-17T17:31:42.460+0000 7f03a73ce700 1 mgr finish mon failed to > return > metadata for osd.51: (2) No such file or directory > 2022-10-17T17:31:42.460+0000 7f03a73ce700 1 mgr finish mon failed to > return > metadata for osd.52: (2) No such file or directory > 2022-10-17T17:31:42.460+0000 7f03a73ce700 1 mgr finish mon failed to > return > metadata for osd.54: (2) No such file or directory > 2022-10-17T17:31:42.460+0000 7f03a6bcd700 1 mgr finish mon failed to > return > metadata for osd.53: (2) No such file or directory > 2022-10-17T17:31:42.460+0000 7f03a73ce700 1 mgr finish mon failed to > return > metadata for osd.55: (2) No such file or directory > 2022-10-17T17:31:42.460+0000 7f03a6bcd700 1 mgr finish mon failed to > return > metadata for osd.56: (2) No such file or directory > 2022-10-17T17:31:42.460+0000 7f03a73ce700 1 mgr finish mon failed to > return > metadata for osd.57: (2) No such file or directory > 2022-10-17T17:31:42.460+0000 7f03a6bcd700 1 mgr finish mon failed to > return > metadata for osd.58: (2) No such file or directory > 2022-10-17T17:31:43.587+0000 7f036d35f700 0 log_channel(cluster) log [DBG] > : pgmap v1006974: 656 pgs: 656 active+clean; 12 TiB data, 36 TiB used, 142 > TiB / 178 TiB avail > 2022-10-17T17:31:45.043+0000 7f035fb44700 -1 mgr get_metadata_python > Requested missing service osd.37 > 2022-10-17T17:31:45.589+0000 7f036d35f700 0 log_channel(cluster) log [DBG] > : pgmap v1006975: 656 pgs: 656 active+clean; 12 TiB data, > > -Brent > > -----Original Message----- > From: Eugen Block <eblock@xxxxxx> > Sent: Monday, October 17, 2022 12:52 PM > To: ceph-users@xxxxxxx > Subject: Re: Cephadm - Adding host to migrated cluster > > Does the cephadm.log on that node reveal anything useful? What about the > (active) mgr log? > > Zitat von Brent Kennedy <bkennedy@xxxxxxxxxx>: > > > Greetings everyone, > > > > > > > > We recently moved a ceph-ansible cluster running pacific on centos 8 > > to centos 8 stream and then upgraded to quincy using cephadm after > > converting to cephadm. Everything with the transition worked but > > recently we decided to add another node to the cluster with 10 more > > drives. We were able to go to the web interface and add the host ( > > with the IP and name ), which spun up the basic management containers > > on the new node. We then went to the OSD section to add the drives > > which were showing as available. They were all recognized, so the > > drives were added via the web console. Cephadm spun up the OSDs and > > that's where things are stuck. The OSDs show up in the cluster but > > are out now. They came up but were then marked down and later out. > > We purged them then zapped the drives and after about 10 minutes, > > cephadm had added them back automatically. It then did the same > > thing, showed them up, then down and put them out. When I look at > > "ceph osd tree", it shows the drives but they don't show up under any > > host ( they are on host osdserver6 ). I am trying to figure out why > > they are not being put under a host since the host server was added to > > cephadm and the server install checks with cephadm were good. The > maintenance containers are running on the host, no issues. Any ideas would > be greatly appreciated. > > > > > > > > -16 36.38199 host osdserver5 > > > > 20 ssd 3.63820 osd.20 up 1.00000 1.00000 > > > > 22 ssd 3.63820 osd.22 up 1.00000 1.00000 > > > > 23 ssd 3.63820 osd.23 up 1.00000 1.00000 > > > > 24 ssd 3.63820 osd.24 up 1.00000 1.00000 > > > > 44 ssd 3.63820 osd.44 up 1.00000 1.00000 > > > > 45 ssd 3.63820 osd.45 up 1.00000 1.00000 > > > > 46 ssd 3.63820 osd.46 up 1.00000 1.00000 > > > > 47 ssd 3.63820 osd.47 up 1.00000 1.00000 > > > > 48 ssd 3.63820 osd.48 up 1.00000 1.00000 > > > > 49 ssd 3.63820 osd.49 up 1.00000 1.00000 > > > > 37 0 osd.37 down 1.00000 1.00000 > > > > 50 0 osd.50 down 1.00000 1.00000 > > > > 51 0 osd.51 down 1.00000 1.00000 > > > > 52 0 osd.52 down 1.00000 1.00000 > > > > 53 0 osd.53 down 1.00000 1.00000 > > > > 54 0 osd.54 down 1.00000 1.00000 > > > > 55 0 osd.55 down 1.00000 1.00000 > > > > 56 0 osd.56 down 1.00000 1.00000 > > > > 57 0 osd.57 down 1.00000 1.00000 > > > > 58 0 osd.58 down 1.00000 1.00000 > > > > > > > > > > > > Regards, > > > > -Brent > > > > > > > > Existing Clusters: > > > > Test: Quincy 17.2.3 ( all virtual on nvme ) > > > > US Production(HDD): Octopus 15.2.16 with 11 osd servers, 3 mons, 4 > > gateways, > > 2 iscsi gateways > > > > UK Production(HDD): Nautilus 14.2.22 with 18 osd servers, 3 mons, 4 > > gateways, 2 iscsi gateways > > > > US Production(SSD): Quincy 17.2.3 Cephadm with 6 osd servers, 5 mons, > > 4 gateways, 2 iscsi gateways > > > > UK Production(SSD): Quincy 17.2.3 with 6 osd servers, 5 mons, 4 > > gateways > > > > > > > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an > > email to ceph-users-leave@xxxxxxx > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email > to ceph-users-leave@xxxxxxx > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx