Thanks Frédéric, Going through your steps helped me narrow down the issue. Oddly, it looks to be a network issue with the new host. Most things connects okay (ssh, ping), but when the data stream gets too big, the connections just hang. And it seems to be host specific as the other storage hosts are all functioning fine. Removing that machine from the cluster also seems to solve the orchestrator services problems. Not sure why that was jamming it up, but Ceph is working normally and I can focus on tracking down network errors instead. Thanks again!, -Mike >>>>> On Wed, 24 Apr 2024 08:46:14 +0200 (CEST), Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> said: FN> Hello Michael, FN> You can try this: FN> 1/ check that the host shows up on ceph orch ls with the right label 'osds' FN> 2/ check that the host is OK with ceph cephadm check-host <hostname>. It should look like: FN> <hostname> (None) ok FN> podman (/usr/bin/podman) version 4.6.1 is present FN> systemctl is present FN> lvcreate is present FN> Unit chronyd.service is enabled and running FN> Hostname "<hostname>" matches what is expected. FN> Host looks OK FN> 3/ double check you service_type 'osd' with ceph orch ls --service-type osd --export FN> It should show the correct placement and spec (drives size, etc.) FN> 4/ enable debugging with ceph config set mgr mgr/cephadm/log_to_cluster_level debug FN> 5/ open a terminal and observe ceph -W cephadm --watch-debug FN> 6/ ceph mgr fail FN> 7/ ceph orch device ls --hostname=<hostname> --wide --refresh (should FN> show local bloc devices as Available and trigger the creation of the FN> OSDs) FN> If your service_type 'osd' is correct, the orchestrator should deploy OSDs on the node. FN> If it does not then look for the reason why in ceph -W cephadm --watch-debug output. FN> Regards, FN> Frédéric. FN> ----- Le 24 Avr 24, à 3:22, Michael Baer ceph@xxxxxxxxxxxxxxx a écrit : >> Hi, >> >> This problem started with trying to add a new storage server into a >> quincy v17.2.6 ceph cluster. Whatever I did, I could not add the drives >> on the new host as OSDs: via dashboard, via cephadm shell, by setting >> osd unmanaged to false. >> >> But what I started realizing is that orchestrator will also no longer >> automatically manage services. I.e. if a service is set to manage by >> labels, removing and adding labels to different hosts for that service >> has no affect. Same if I set a service to be manage via hostnames. Same >> if I try to drain a host (the services/podman containers just keep >> running). Although, I am able to add/rm services via 'cephadm shell ceph >> orch daemon add/rm'. But Ceph will not manage automatically using >> labels/hostnames. >> >> This apparently includes OSD daemons. I can not create and on the new >> host either automatically or manually, but I'm hoping the services/OSD >> issues are related and not two issues. >> >> I haven't been able to find any obvious errors in /var/log/ceph, >> /var/log/syslog, logs <container>, etc. I have been able to get 'slow >> ops' errors on monitors by trying to add OSDs manually (and having to >> restart the monitor). I've also gotten cephadm shell to hang. And had to >> restart managers. I'm not an expert and it could be something obvious, >> but I haven't been able to figure out a solution. If anyone has any >> suggestions, I would greatly appreciate them. >> >> Thanks, >> Mike >> >> -- >> Michael Baer >> ceph@xxxxxxxxxxxxxxx >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx -- Michael Baer ceph@xxxxxxxxxxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx