Dashboard crash with rook/reef and external prometheus

r-ceph@xxxxxxxxxxxx · Tue, 24 Oct 2023 16:39:46 -0000

I'm fairly new to the community so I figured I'd ask about this here before creating an issue - I'm not sure how supported this config is.

I am running rook v1.12.6 and ceph 18.2.0.  I've enabled the dashboard in the CRD and it has been working for a while.  However, the charts are empty.

I do have Prometheus+Grafana running on my cluster, and I can access many of the ceph metrics from there.  With the upgrade to reef I noticed that many of the qunicy dashboard elements have been replaced by charts, so I wanted to get those working.

I discovered that if I run ceph dashboard set-prometheus-api-host <url> the charts immediately are populated (including historical data).  However, when I do this I rapidly start getting ceph health alerts due to a crashing mgr module.  If I set the prometheus api host url to '' the crashes stop accumulating, though this disables the charts.

I am running the prometheus-community/prometheus-25.2.0 chart.  Various ceph grafana dashboards that I've found published work fine.

The following are relevant dumps.  Please let me know if you have any ideas, or if I should go ahead and create an issue for this...

mgr console output during crash:
debug 2023-10-24T15:11:23.498+0000 7fc81fa5d700 -1 mgr.server reply reply (2) No such file or directory This Orchestrator does not support `orch prometheus access info`
debug 2023-10-24T15:11:23.502+0000 7fc7ea3f3700  0 [dashboard INFO request] [::ffff:10.1.0.106:49760] [GET] [200] [0.012s] [admin] [101.0B] /api/health/get_cluster_capacity
debug 2023-10-24T15:11:23.502+0000 7fc813985700  0 [stats WARNING root] cmdtag  not found in client metadata
debug 2023-10-24T15:11:23.502+0000 7fc813985700  0 [stats WARNING root] cmdtag  not found in client metadata
debug 2023-10-24T15:11:23.502+0000 7fc7e83ef700  0 [dashboard INFO request] [::ffff:10.1.0.106:5580] [GET] [200] [0.011s] [admin] [73.0B] /api/osd/settings
debug 2023-10-24T15:11:23.506+0000 7fc85411a700  0 log_channel(audit) log [DBG] : from='mon.2 -' entity='mon.' cmd=[{"prefix": "balancer status", "format": "json"}]: dispatch
debug 2023-10-24T15:11:23.506+0000 7fc813985700  0 [stats WARNING root] cmdtag  not found in client metadata
debug 2023-10-24T15:11:23.506+0000 7fc7e9bf2700  0 [dashboard INFO request] [::ffff:10.1.0.106:20241] [GET] [200] [0.014s] [admin] [34.0B] /api/prometheus/rules
debug 2023-10-24T15:11:23.630+0000 7fc7ebbf6700  0 [dashboard INFO orchestrator] is orchestrator available: True,
debug 2023-10-24T15:11:23.734+0000 7fc7ebbf6700  0 [dashboard INFO orchestrator] is orchestrator available: True,
debug 2023-10-24T15:11:23.802+0000 7fc86511c700  0 log_channel(cluster) log [DBG] : pgmap v126: 617 pgs: 53 active+remapped+backfill_wait, 2 active+remapped+backfilling, 562 active+clean; 34 TiB data, 68 TiB used, 64 TiB / 132 TiB avail; 2.4 MiB/s rd, 93 KiB/s wr, 21 op/s; 1213586/22505781 objects misplaced (5.392%)
debug 2023-10-24T15:11:23.862+0000 7fc7ebbf6700  0 [dashboard INFO orchestrator] is orchestrator available: True,
debug 2023-10-24T15:11:23.962+0000 7fc7ebbf6700  0 [dashboard INFO orchestrator] is orchestrator available: True,
debug 2023-10-24T15:11:24.058+0000 7fc7ebbf6700  0 [dashboard INFO orchestrator] is orchestrator available: True,
debug 2023-10-24T15:11:24.158+0000 7fc7ebbf6700  0 [dashboard INFO orchestrator] is orchestrator available: True,
debug 2023-10-24T15:11:24.270+0000 7fc7ebbf6700  0 [dashboard INFO orchestrator] is orchestrator available: True,
debug 2023-10-24T15:11:24.546+0000 7fc7ebbf6700  0 [dashboard INFO orchestrator] is orchestrator available: True,
debug 2023-10-24T15:11:24.654+0000 7fc7ebbf6700  0 [dashboard INFO orchestrator] is orchestrator available: True,
debug 2023-10-24T15:11:24.654+0000 7fc7ebbf6700  0 [dashboard INFO request] [::ffff:10.1.0.106:13711] [GET] [200] [1.170s] [admin] [3.2K] /api/health/minimal
debug 2023-10-24T15:11:25.802+0000 7fc86511c700  0 log_channel(cluster) log [DBG] : pgmap v127: 617 pgs: 53 active+remapped+backfill_wait, 2 active+remapped+backfilling, 562 active+clean; 34 TiB data, 68 TiB used, 64 TiB / 132 TiB avail; 1.1 MiB/s rd, 53 KiB/s wr, 17 op/s; 1213586/22505781 objects misplaced (5.392%)
debug 2023-10-24T15:11:27.802+0000 7fc86511c700  0 log_channel(cluster) log [DBG] : pgmap v128: 617 pgs: 53 active+remapped+backfill_wait, 2 active+remapped+backfilling, 562 active+clean; 34 TiB data, 68 TiB used, 64 TiB / 132 TiB avail; 1.8 MiB/s rd, 58 KiB/s wr, 18 op/s; 1213586/22505781 objects misplaced (5.392%)
debug 2023-10-24T15:11:28.494+0000 7fc813985700  0 [stats WARNING root] cmdtag  not found in client metadata
debug 2023-10-24T15:11:28.498+0000 7fc7eb3f5700  0 [dashboard INFO request] [::ffff:10.1.0.106:20241] [GET] [200] [0.011s] [admin] [73.0B] /api/osd/settings
debug 2023-10-24T15:11:28.498+0000 7fc85411a700  0 log_channel(audit) log [DBG] : from='mon.2 -' entity='mon.' cmd=[{"prefix": "orch prometheus access info"}]: dispatch
debug 2023-10-24T15:11:28.502+0000 7fc7ec3f7700  0 [dashboard INFO request] [::ffff:10.1.0.106:5580] [GET] [200] [0.006s] [admin] [102.0B] /api/health/get_cluster_capacity
debug 2023-10-24T15:11:28.502+0000 7fc7e93f1700  0 [dashboard INFO request] [::ffff:10.1.0.106:44009] [GET] [200] [0.005s] [admin] [22.0B] /api/prometheus/notifications
debug 2023-10-24T15:11:28.502+0000 7fc81fa5d700 -1 Remote method threw exception: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 675, in get_prometheus_access_info
    raise NotImplementedError()
NotImplementedError

ceph crash info dump:
{
    "backtrace": [
        "  File \"/usr/share/ceph/mgr/orchestrator/_interface.py\", line 675, in get_prometheus_access_info\n    raise NotImplementedError()",
        "NotImplementedError"
    ],
    "ceph_version": "18.2.0",
    "crash_id": "2023-10-24T14:59:52.921427Z_08e06575-0431-47fe-afc5-be8e4a7d1144",
    "entity_name": "mgr.a",
    "mgr_module": "rook",
    "mgr_module_caller": "ActivePyModule::dispatch_remote get_prometheus_access_info",
    "mgr_python_exception": "NotImplementedError",
    "os_id": "centos",
    "os_name": "CentOS Stream",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mgr",
    "stack_sig": "bbf52dcdbbe54d67edf59ebdb5d201fffd921db5a9dd4431964c2aaac2250c7e",
    "timestamp": "2023-10-24T14:59:52.921427Z",
    "utsname_hostname": "k8s4",
    "utsname_machine": "x86_64",
    "utsname_release": "5.15.0-87-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#97-Ubuntu SMP Mon Oct 2 21:09:21 UTC 2023"
}

--
Rich
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx