Re: What are you doing to locate performance issues in a Ceph cluster?

Francois Lafont <flafdivers@xxxxxxx> · Wed, 08 Apr 2015 16:10:31 +0200

Chris Kitzmiller wrote:

>> ~# ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok perf
>>
>>  [...]
>>
>>  "osd": { "opq": 0,
>>      "op_wip": 0,
>>      "op": 3566,
>>      "op_in_bytes": 208803635,
>>      "op_out_bytes": 146962506,
>>      "op_latency": { "avgcount": 3566,
>>          "sum": 100.330695000},
>>      "op_process_latency": { "avgcount": 3566,
>>          "sum": 84.702772000},
>>      "op_r": 471,
>>      "op_r_out_bytes": 146851024,
>>      "op_r_latency": { "avgcount": 471,
>>          "sum": 1.329795000},
>>
>>   [...]
>>
>> Is the value of "op_r_latency" (ie 1.329ms above)?
>> In this case, I don't understand the meaning of "avgcount"
>> and "sum".
>>
>> "sum" is the sum of what?
>> "avgcount" is the average of what?
> 
> There are a bunch of these avgcount/sum pairs and, from what I've gleaned, you're to simply divide sum by avgcount to get the mean of that particular stat over whatever time frame it is measuring.

Err..., I'm sorry, I'm not sure to well understand. If I take the values
of op_r_latency above, I have:

    sum/avgcount = 1.329795000/471 = 0.002823344

0,002823344ms would be my latency of read operation?
It seems to me impossible (unfortunately ;)) or maybe the unit is in seconds?
In this case 2.823344ms could be a plausible value. In any case,
I don't understand the name "avgcount". The name "count" seems to me
more logical (but maybe I don't really have understand its meaning).

If I see the source code ./src/common/perf_counters.cc, it seems to me
that, indeed, the number is in seconds but I'm (really) not a c++ expert.
Is possible to confirm to me that?

Another thing: if I understand well, the value sum/avgcount is an average
of latency, average computed from the start of the osd daemon. So, after lot of
times, the average will be more stable and it no longer incur variation.
Is it possible to restart the counters? I noticed that restarting the daemon
works but it's a little drastic.

-- 
François Lafont
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com