Re: Orchestrator not automating services / OSD issue

Michael Baer <ceph@xxxxxxxxxxxxxxx> · Wed, 24 Apr 2024 11:34:09 -0700

Thanks Frédéric,

Going through your steps helped me narrow down the issue. Oddly, it
looks to be a network issue with the new host. Most things connects okay
(ssh, ping), but when the data stream gets too big, the connections just
hang. And it seems to be host specific as the other storage hosts are
all functioning fine.

Removing that machine from the cluster also seems to solve the
orchestrator services problems. Not sure why that was jamming it up, but
Ceph is working normally and I can focus on tracking down network errors
instead.

Thanks again!,
-Mike

>>>>> On Wed, 24 Apr 2024 08:46:14 +0200 (CEST), Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> said:

    FN> Hello Michael,
    FN> You can try this:

    FN> 1/ check that the host shows up on ceph orch ls with the right label 'osds'
    FN> 2/ check that the host is OK with ceph cephadm check-host <hostname>. It should look like:
    FN> <hostname> (None) ok
    FN> podman (/usr/bin/podman) version 4.6.1 is present
    FN> systemctl is present
    FN> lvcreate is present
    FN> Unit chronyd.service is enabled and running
    FN> Hostname "<hostname>" matches what is expected.
    FN> Host looks OK
    FN> 3/ double check you service_type 'osd' with ceph orch ls --service-type osd --export
    FN> It should show the correct placement and spec (drives size, etc.)
    FN> 4/ enable debugging with ceph config set mgr mgr/cephadm/log_to_cluster_level debug
    FN> 5/ open a terminal and observe ceph -W cephadm --watch-debug
    FN> 6/ ceph mgr fail
    FN> 7/ ceph orch device ls --hostname=<hostname> --wide --refresh (should
    FN> show local bloc devices as Available and trigger the creation of the
    FN> OSDs)

    FN> If your service_type 'osd' is correct, the orchestrator should deploy OSDs on the node.
    FN> If it does not then look for the reason why in ceph -W cephadm --watch-debug output.

    FN> Regards,
    FN> Frédéric.

    FN> ----- Le 24 Avr 24, à 3:22, Michael Baer ceph@xxxxxxxxxxxxxxx a écrit :

    >> Hi,
    >> 
    >> This problem started with trying to add a new storage server into a
    >> quincy v17.2.6 ceph cluster. Whatever I did, I could not add the drives
    >> on the new host as OSDs: via dashboard, via cephadm shell, by setting
    >> osd unmanaged to false.
    >> 
    >> But what I started realizing is that orchestrator will also no longer
    >> automatically manage services. I.e. if a service is set to manage by
    >> labels, removing and adding labels to different hosts for that service
    >> has no affect. Same if I set a service to be manage via hostnames. Same
    >> if I try to drain a host (the services/podman containers just keep
    >> running). Although, I am able to add/rm services via 'cephadm shell ceph
    >> orch daemon add/rm'. But Ceph will not manage automatically using
    >> labels/hostnames.
    >> 
    >> This apparently includes OSD daemons. I can not create and on the new
    >> host either automatically or manually, but I'm hoping the services/OSD
    >> issues are related and not two issues.
    >> 
    >> I haven't been able to find any obvious errors in /var/log/ceph,
    >> /var/log/syslog, logs <container>, etc. I have been able to get 'slow
    >> ops' errors on monitors by trying to add OSDs manually (and having to
    >> restart the monitor). I've also gotten cephadm shell to hang. And had to
    >> restart managers. I'm not an expert and it could be something obvious,
    >> but I haven't been able to figure out a solution. If anyone has any
    >> suggestions, I would greatly appreciate them.
    >> 
    >> Thanks,
    >> Mike
    >> 
    >> --
    >> Michael Baer
    >> ceph@xxxxxxxxxxxxxxx
    >> _______________________________________________
    >> ceph-users mailing list -- ceph-users@xxxxxxx
    >> To unsubscribe send an email to ceph-users-leave@xxxxxxx

-- 
Michael Baer
ceph@xxxxxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx