On Apr 7, 2015, at 7:44 PM, Francois Lafont wrote: > Chris Kitzmiller wrote: > I graph aggregate stats for `ceph --admin-daemon >> /var/run/ceph/ceph-osd.$osdid.asok perf dump`. If the max latency strays too far >> outside of my mean latency I know to go look for the troublemaker. My graphs >> look something like this: >> >> [...] > > Thanks Chris for these interesting explanations. > Sorry for my basic question but which is the entry in the output that gives > you the read latency? > > Here is an example from my cluster (Firefly): > > ~# ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok perf > > [...] > > "osd": { "opq": 0, > "op_wip": 0, > "op": 3566, > "op_in_bytes": 208803635, > "op_out_bytes": 146962506, > "op_latency": { "avgcount": 3566, > "sum": 100.330695000}, > "op_process_latency": { "avgcount": 3566, > "sum": 84.702772000}, > "op_r": 471, > "op_r_out_bytes": 146851024, > "op_r_latency": { "avgcount": 471, > "sum": 1.329795000}, > > [...] > > Is the value of "op_r_latency" (ie 1.329ms above)? > In this case, I don't understand the meaning of "avgcount" > and "sum". > > "sum" is the sum of what? > "avgcount" is the average of what? There are a bunch of these avgcount/sum pairs and, from what I've gleaned, you're to simply divide sum by avgcount to get the mean of that particular stat over whatever time frame it is measuring. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com