On 9/13/21 11:13 PM, Jeff Layton wrote:
On Mon, 2021-09-13 at 18:43 +0530, Venky Shankar wrote:
Right now, cumulative read/write/metadata latencies are tracked
and are periodically forwarded to the MDS. These meterics are not
particularly useful. A much more useful metric is the average latency
and standard deviation (stdev) which is what this series of patches
aims to do.
The userspace (libcephfs+tool) changes are here::
https://github.com/ceph/ceph/pull/41397
The math involved in keeping track of the average latency and stdev
seems incorrect, so, this series fixes that up too (closely mimics
how its done in userspace with some restrictions obviously) as per::
NEW_AVG = OLD_AVG + ((latency - OLD_AVG) / total_ops)
NEW_STDEV = SQRT(((OLD_STDEV + (latency - OLD_AVG)*(latency - NEW_AVG)) / (total_ops - 1)))
Note that the cumulative latencies are still forwarded to the MDS but
the tool (cephfs-top) ignores it altogether.
Venky Shankar (4):
ceph: use "struct ceph_timespec" for r/w/m latencies
ceph: track average/stdev r/w/m latency
ceph: include average/stddev r/w/m latency in mds metrics
ceph: use tracked average r/w/m latencies to display metrics in
debugfs
fs/ceph/debugfs.c | 12 +++----
fs/ceph/metric.c | 81 +++++++++++++++++++++++++----------------------
fs/ceph/metric.h | 64 +++++++++++++++++++++++--------------
3 files changed, 90 insertions(+), 67 deletions(-)
This looks reasonably sane. I'll plan to go ahead and pull this into the
testing kernels and do some testing with them. If anyone has objections
(Xiubo?) let me know and I can take them out.
Sorry I think I missed this patch set.
Comment it in the V2.
Thanks.
Thanks,