Re: What are you doing to locate performance issues in a Ceph cluster?

Francois Lafont <flafdivers@xxxxxxx> · Wed, 08 Apr 2015 01:44:19 +0200

Hi,

Chris Kitzmiller wrote:

> I graph aggregate stats for `ceph --admin-daemon 
> /var/run/ceph/ceph-osd.$osdid.asok perf dump`. If the max latency strays too far 
> outside of my mean latency I know to go look for the troublemaker. My graphs 
> look something like this:
>
> [...]

Thanks Chris for these interesting explanations.
Sorry for my basic question but which is the entry in the output that gives
you the read latency?

Here is an example from my cluster (Firefly):

~# ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok perf

  [...]

  "osd": { "opq": 0,
      "op_wip": 0,
      "op": 3566,
      "op_in_bytes": 208803635,
      "op_out_bytes": 146962506,
      "op_latency": { "avgcount": 3566,
          "sum": 100.330695000},
      "op_process_latency": { "avgcount": 3566,
          "sum": 84.702772000},
      "op_r": 471,
      "op_r_out_bytes": 146851024,
      "op_r_latency": { "avgcount": 471,
          "sum": 1.329795000},

   [...]

Is the value of "op_r_latency" (ie 1.329ms above)?
In this case, I don't understand the meaning of "avgcount"
and "sum".

"sum" is the sum of what?
"avgcount" is the average of what?

Thanks in advance.

-- 
François Lafont
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com