Hi,
I am a graduate student at the university of wisconsin, Madison. I have been trying to understand the recovery mechanisms in ceph and had a question about accurate collection of metrics. My previous email might have missed your notice,
so sending a note again.
Any help regarding this would be highly appreciated.
Thank You,
Surabhi Gupta
From: Surabhi GUPTA
Sent: Wednesday, July 14, 2021 5:47:47 PM
To: dev@xxxxxxx <dev@xxxxxxx>
Subject: Seeking advice regarding collecting better client and recovery throughput metrics
Sent: Wednesday, July 14, 2021 5:47:47 PM
To: dev@xxxxxxx <dev@xxxxxxx>
Subject: Seeking advice regarding collecting better client and recovery throughput metrics
Hi,
I was running some experiments to measure client IO throughput and recovery throughput in a ceph cluster. I am a bit uncertain if I am collecting the metrics correctly. Could you please tell me if this is the right way or if I can do anything better to collect
more accurate statistics?
To generate load on the cluster, I use the rados bench utility and plot the avg MB/s and cur MB/s values reported by the tool. For recovery, I am periodically querying the perf dump for each osd and looking at the recovery_ops and recovery_bytes. I then calculate
the recovery throughput based on the difference in values obtained on successive querying and time difference between these queries.
I also saw that ceph health displays client iops and recovery iops. So one way is to periodically query "ceph -s", extract these values and use them for the analysis.
Could you please tell me which is the best way to obtain these metrics - In the sense that which one exposes more accurate instantaneous throughput values?
Is there any other method apart from these two that I should be looking at?
I would greatly appreciate any help regarding this!
Thank You,
Surabhi Gupta
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx