All right, I've solved the osd perf dump collection problem. In case anyone encountered the same problem and found this email, I'd like to show how I setup in Nautilus: 1. modify prometheus python module in mgr, in my case, that file locates at /usr/share/ceph/mgr/prometheus/module.py. Add `prio_limit=0` as a parameter when calling `get_all_perf_counters` function. 2. set mgr_stats_threshold = 0 in ceph.conf, and restart mgr 3. enable prometheus module by command `ceph mgr module enable prometheus` if it hasn't been enabled Now all osd perf results can be found by query port 9283 of mgr. I've run that for days, and so far so good. ps: I'm still confused by codes like OSDPerfMetricCollector, maybe those codes are designed to solve other issues besides collect perf metric? I'm hoping anyone can tell me, thanks! Best wishes. Xinying Song <songxinying.ftd@xxxxxxxxx> 于2022年7月31日周日 16:58写道: > > Hi, everyone: > I'm trying to monitor OSD's throttler, since our clusters have > encountered throttler being full problems several times, and there is > no warning message. By reading the code, I found there is an interface > called OSD::get_perf_reports, which periodically sends osd stats to > mgr. What confused me is that this function wrote a bunch of codes to > implement instead of using a 'perf dump' logic to get all stats. > > My question is why not just collect all information by a `perf dump` > logic, that will not only simplify codes but also expose all perf > counters to mgr, which can help to monitor those counters? > > Thanks! _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx