Hi, Chris Kitzmiller wrote: > I graph aggregate stats for `ceph --admin-daemon > /var/run/ceph/ceph-osd.$osdid.asok perf dump`. If the max latency strays too far > outside of my mean latency I know to go look for the troublemaker. My graphs > look something like this: > > [...] Thanks Chris for these interesting explanations. Sorry for my basic question but which is the entry in the output that gives you the read latency? Here is an example from my cluster (Firefly): ~# ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok perf [...] "osd": { "opq": 0, "op_wip": 0, "op": 3566, "op_in_bytes": 208803635, "op_out_bytes": 146962506, "op_latency": { "avgcount": 3566, "sum": 100.330695000}, "op_process_latency": { "avgcount": 3566, "sum": 84.702772000}, "op_r": 471, "op_r_out_bytes": 146851024, "op_r_latency": { "avgcount": 471, "sum": 1.329795000}, [...] Is the value of "op_r_latency" (ie 1.329ms above)? In this case, I don't understand the meaning of "avgcount" and "sum". "sum" is the sum of what? "avgcount" is the average of what? Thanks in advance. -- François Lafont _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com