Re: OSD: why perf stats not collect all counters like perf dump?

Xinying Song <songxinying.ftd@xxxxxxxxx> · Sun, 14 Aug 2022 11:03:17 +0800

All right, I've solved the osd perf dump collection problem. In case
anyone encountered the same problem and found this email, I'd like to
show how I setup in Nautilus:

1. modify prometheus python module in mgr, in my case, that file
locates at /usr/share/ceph/mgr/prometheus/module.py. Add
`prio_limit=0` as a parameter when calling `get_all_perf_counters`
function.
2. set mgr_stats_threshold = 0 in ceph.conf, and restart mgr
3. enable prometheus module by command `ceph mgr module enable
prometheus` if it hasn't been enabled

Now all osd perf results can be found by query port 9283 of mgr. I've
run that for days, and so far so good.

ps: I'm still confused by codes like OSDPerfMetricCollector, maybe
those codes are designed to solve other issues besides collect perf
metric? I'm hoping anyone can tell me, thanks!

Best wishes.

Xinying Song <songxinying.ftd@xxxxxxxxx> 于2022年7月31日周日 16:58写道：
>
> Hi, everyone:
> I'm trying to monitor OSD's throttler, since our clusters have
> encountered throttler being full problems several times, and there is
> no warning message. By reading the code, I found there is an interface
> called OSD::get_perf_reports, which periodically sends osd stats to
> mgr. What confused me is that this function wrote a bunch of codes to
> implement instead of using a 'perf dump' logic to get all stats.
>
> My question is why not just collect all information by a `perf dump`
> logic, that will not only simplify codes but also expose all perf
> counters to mgr, which can help to monitor those counters?
>
> Thanks!
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx