Re: Throughput metrics missing iwhen updating Ceph Quincy to Reef

Eugen Block <eblock@xxxxxx> · Thu, 25 Jan 2024 23:17:52 +0000

Ah, there they are (different port):

reef01:~ # curl http://localhost:9926/metrics | grep ceph_osd_op | head
  % Total    % Received % Xferd  Average Speed   Time    Time      
Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  124k  100  124k    0     0   111M      0 --:--:-- --:--:-- --:--:--  121M
# HELP ceph_osd_op Client operations
# TYPE ceph_osd_op counter
ceph_osd_op{ceph_daemon="osd.1"} 25
ceph_osd_op{ceph_daemon="osd.4"} 543
ceph_osd_op{ceph_daemon="osd.5"} 12192
# HELP ceph_osd_op_delayed_degraded Count of ops delayed due to target  
object being degraded
# TYPE ceph_osd_op_delayed_degraded counter
ceph_osd_op_delayed_degraded{ceph_daemon="osd.1"} 0
ceph_osd_op_delayed_degraded{ceph_daemon="osd.4"} 0
ceph_osd_op_delayed_degraded{ceph_daemon="osd.5"} 0

I can't check the dashboard right now, that I will definitely do tomorrow.
Good night!

Zitat von Eugen Block <eblock@xxxxxx>:

Yeah, it's mentioned in the upgrade docs [2]:

Monitoring & Alerting
      Ceph-exporter: Now the performance metrics for Ceph daemons  
are exported by ceph-exporter, which deploys on each daemon rather  
than using prometheus exporter. This will reduce performance  
bottlenecks.

[2] https://docs.ceph.com/en/latest/releases/reef/#major-changes-from-quincy

Zitat von Eugen Block <eblock@xxxxxx>:

Hi,

I got those metrics back after setting:

reef01:~ # ceph config set mgr mgr/prometheus/exclude_perf_counters false

reef01:~ # curl http://localhost:9283/metrics | grep ceph_osd_op | head
 % Total    % Received % Xferd  Average Speed   Time    Time      
Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100  324k  100  324k    0     0  72.5M      0 --:--:-- --:--:--  
--:--:-- 79.1M
# HELP ceph_osd_op Client operations
# TYPE ceph_osd_op counter
ceph_osd_op{ceph_daemon="osd.0"} 139650.0
ceph_osd_op{ceph_daemon="osd.11"} 9711090.0
ceph_osd_op{ceph_daemon="osd.2"} 3864.0
ceph_osd_op{ceph_daemon="osd.1"} 25.0
ceph_osd_op{ceph_daemon="osd.4"} 543.0
ceph_osd_op{ceph_daemon="osd.5"} 12192.0
ceph_osd_op{ceph_daemon="osd.3"} 3661521.0
ceph_osd_op{ceph_daemon="osd.6"} 2030.0

I found the option in the docs [1], but the same section is in the  
quincy docs as well, although there's no such option in my quincy  
cluster, maybe that's why it still exports those performance  
counters in my quincy cluster:

quincy-1:~ # ceph config get mgr mgr/prometheus/exclude_perf_counters
Error ENOENT: unrecognized key 'mgr/prometheus/exclude_perf_counters'

Anyway, this should bring back the metrics the "legacy" way (I  
guess). Apparently, the ceph-exporter daemon is now required on  
your hosts to collect those metrics.
After adding the ceph-exporter service (ceph orch apply  
ceph-exporter) and setting mgr/prometheus/exclude_perf_counters  
back to "true" I see that there are "ceph_osd_op" metrics defined  
but no values yet. Apparently, I'm still missing something, I'll  
check tomorrow. But this could/should be in the upgrade docs IMO.

Regards,
Eugen

[1]  
https://docs.ceph.com/en/latest/mgr/prometheus/#ceph-daemon-performance-counters-metrics

Zitat von Martin <ceph@xxxxxxxxxxxxx>:

Hi,

Confirmed that this happens to me as well.
After upgrading from 18.2.0 to 18.2.1 OSD metrics  
like: ceph_osd_op_* are missing from ceph-mgr.

The Grafana dashboard also doesn't display all graphs correctly.

ceph-dashboard/Ceph - Cluster : Capacity used, Cluster I/O, OSD  
Capacity Utilization, PGs per OSD....

curl http://localhost:9283/metrics | grep -i ceph_osd_op
  % Total    % Received % Xferd  Average Speed   Time Time      
Time  Current
                                 Dload  Upload   Total Spent    Left  Speed
100 38317  100 38317    0     0   9.8M      0 --:--:-- --:--:--  
--:--:-- 12.1M

Before the upgrading to reef 18.2.1 I could get all the metrics.

Martin

On 18/01/2024 12:32, Jose Vicente wrote:
Hi,
After upgrading from Quincy to Reef the ceph-mgr daemon is not  
throwing some throughput OSD metrics like: ceph_osd_op_*
curl http://localhost:9283/metrics | grep -i ceph_osd_op
  % Total    % Received % Xferd  Average Speed   Time  Time      
Time  Current
                                 Dload  Upload   Total Spent    
 Left  Speed
100  295k  100  295k    0     0   144M      0 --:--:-- --:--:--  
--:--:--  144M
However I can get other metrics like:
# curl http://localhost:9283/metrics | grep -i ceph_osd_apply
# HELP ceph_osd_apply_latency_ms OSD stat apply_latency_ms
# TYPE ceph_osd_apply_latency_ms gauge
ceph_osd_apply_latency_ms{ceph_daemon="osd.275"} 152.0
ceph_osd_apply_latency_ms{ceph_daemon="osd.274"} 102.0
...
Before the upgrading to reef (from quincy) I I could get all the  
metrics. MGR module prometheus is enabled.
Rocky Linux release 8.8 (Green Obsidian)
ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76)  
reef (stable)
# netstat -nap | grep 9283
tcp        0      0 127.0.0.1:53834         127.0.0.1:9283      
 ESTABLISHED 3561/prometheus
tcp6       0      0 :::9283                 :::*      LISTEN      
 804985/ceph-mgr
Thanks,
Jose C.

_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx