ceph-mgr ssh connections left open

Wyll Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx> · Tue, 18 Jul 2023 14:56:12 +0000

Every night at midnight, our ceph-mgr daemons open up ssh connections to the other nodes and then leaves them open. Eventually they become zombies.
I cannot figure out what module is causing this or how to turn it off.  If left unchecked over days/weeks, the zombie ssh connections just keep growing, the only way to clear them is to restart ceph-mgr services.

Any idea what is causing this or how it can be disabled?

Example:

ceph     1350387 1350373  7 Jul17 ?        01:19:39 /usr/bin/ceph-mgr -n mgr.mon03 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix

ceph     1350548 1350387  0 Jul17 ?        00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx@10.4.1.11 sudo python

ceph     1350549 1350387  0 Jul17 ?        00:00:02 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx@10.4.1.41 sudo python

ceph     1350550 1350387  0 Jul17 ?        00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx@10.4.1.42 sudo python

ceph     1350551 1350387  0 Jul17 ?        00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx@10.4.1.22 sudo python

ceph     1350552 1350387  0 Jul17 ?        00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx@10.4.1.23 sudo python

root     1350553     902  0 Jul17 ?        00:00:00 sshd: xxx [priv]

ceph     1350554 1350387  0 Jul17 ?        00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx@10.4.1.105 sudo pytho

ceph     1350556 1350387  0 Jul17 ?        00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx@10.4.1.21 sudo python

ceph     1350557 1350387  0 Jul17 ?        00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx@10.4.1.101 sudo pytho

ceph     1350559 1350387  0 Jul17 ?        00:00:01 ssh -C -F /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx@10.4.1.102 sudo pytho

Our current list of ceph-mgr modules enabled and default is:

    "always_on_modules": [

        "balancer",

        "crash",

        "devicehealth",

        "orchestrator",

        "pg_autoscaler",

        "progress",

        "rbd_support",

        "status",

        "telemetry",

        "volumes"

    ],

    "enabled_modules": [

        "cephadm",

        "dashboard",

        "diskprediction_local",

        "nfs",

        "prometheus",

        "restful"

    ],

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx