ceph-mgr high CPU utilization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm wondering if anyone still sees issues with ceph-mgr using CPU and being unresponsive even in recent Nautilus releases.  We upgraded our largest cluster from Mimic to Nautilus (14.2.8) recently - it has about 3500 OSDs.  Now ceph-mgr is constantly at 100-200% CPU (1-2 cores), and becomes unresponsive after a few minutes.  The finisher-Mgr queue length grows (I've seen it at over 100k) - similar symptoms as seen with earlier Nautilus releases by many.  This is what it looks like after an hour of running:

    "finisher-Mgr": {
        "queue_len": 66078,
        "complete_latency": {
            "avgcount": 21,
            "sum": 2098.408767721,
            "avgtime": 99.924227034
        }
    },

We have a pretty vanilla manager config, only the balancer is enabled in upmap mode.  Here are the enabled modules:

    "always_on_modules": [
        "balancer",
        "crash",
        "devicehealth",
        "orchestrator_cli",
        "progress",
        "rbd_support",
        "status",
        "volumes"
    ],
    "enabled_modules": [
        "restful"
    ],

Any ideas or outstanding issues in this area?

Andras
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux